Slackware

Slackware
A true linux-distro

Thursday, May 23, 2013

Post 9: Downloading all the files from a directory on the Internet.

Suppose someone tells you that you need to download all the files from a directory on the Internet, say, mirrors.usc.edu/pub/linux/distributions/slackware/slackware_source/l/vte/
, where .../vte/ contains all the files you need.


Personally, I found this quite tricky. In order to download all the files from this directory do:

wget -nH -r --cut-dirs=6 --no-parent mirrors.usc.edu/pub/linux/distributions/slackware/slackware_source/l/vte/

where ,

-nH -->       --no-host-directories(the following have been directly lifted from the respective man page)

Disable generation of host-prefixed directories.  By default, invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/.  This option disables such behavior.

-r -->  files will be recursively retrieved. Say that /vte contained subdirectories a, b  and c. Doing an -r would retrieve the files contained in all these directories as well. Without explicitly specifying the depth of this recursion, the default depth is 5. You can override the default depth by specifying the level explicitly as --level=x

--cut-dirs=6 --> ensured that you did not end up creating a hierarchy of directories from the path where you downloaded /vte. Without this prefix, you would have ended up with pub/linux/distributions/slackware/slackware_source/l/vte/. Instead, you want to prune away pub/linux/distributions/slackware/slackware_source/l/, which amount to 6 levels of hierarchy.

--no-parent OR -np --> (the following have been directly lifted from the respective man page)

Do not ever ascend to the parent directory when retrieving recursively.  This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.





No comments:

Post a Comment