wget: some quick tips
October 11, 2007
wget is one of my favorite tools in *nix land. Sometimes you want to convert a dynamic site to html. Sometimes, you want to download all the rpm, deb, iso, or tgz files in a directory. Other times, you just want to create an archive. wget does it all!
Here are some of my favorite wget command options, and what they do:
$ wget -r -np -nd http://example.com/packages/
This little gem is probably my most used variation. It will download all files in the /packages/ directory on example.com — without traversing up to parent directories (-np), and without recreating the directory structure on your machine (-nd).
$ wget -r -np -nd --accept=iso http://example.com/centos-5/i386/
Adding the –accept argument with a list of file extensions (comma separated) will grab only those files ending in the specified extension.
Another way to grab just the files you want:
$ wget -i filename.txt
Put all the desired urls in filename.txt and run wget against it to download a list of files automatically.
On a bad connection?
$ wget -c http://example.com/really-big-file.iso
The “-c” option tells wget to continue and retry until it has completed downloading.
wget -m -k (-H) http://www.example.com/
Mirror a site, converting its links to work locally, so that you can move the site to another server. Use the ‘-H’ option if images are loaded from another site.
Another useful tool for mirroring websites is httrack. I blogged about it a couple of weeks ago here.
Posted in 

content rss
October 12th, 2007 at 9:01 am
Thanks for sharing your tip. Was really helpful.
–
Sudar
October 12th, 2007 at 2:00 pm
-erobots=off makes wget to ignore robots.txt
October 12th, 2007 at 2:15 pm
Thanks. This is nicer than digging through the man page.
October 12th, 2007 at 4:25 pm
[...] wget: some quick tips Sometimes you want to convert a dynamic site to html… (tags: download leet network software tips tool tutorial unix web howto) [...]
October 12th, 2007 at 7:56 pm
[...] wget: some quick tips » Tip o’ the Day (tags: bash wget) [...]
October 15th, 2007 at 5:53 am
It can be also very useful this option:
–user-agent=agent-string
Identify as agent-string to the HTTP server.
Some sites stop the default user agent of wget because they don’t want to be mirrored . But with this option you can provide what you want and continue to mirror without trouble.
Regards, Riccardo Giuntoli
October 30th, 2007 at 3:23 pm
Thanks for the great tips.
I will start using wget to mirror sites!
I wonder if there is a windows port of wget so I can use it at work.
-Matt
February 8th, 2008 at 3:04 pm
[...] [本文转载自LinuxToy:http://linuxtoy.org/archives/wget-tips.html][via] [...]
March 6th, 2008 at 5:43 am
Very similar: http://www.pixelbeat.org/cmdline.html#wget
April 27th, 2008 at 9:14 am
[...] [via] [...]