The GNU Wget FAQ

[image of the Head of a GNU]
FAQ Version 20050629 (Wget version 1.10)

0. Meta Info

0.0 Where do I get this document?
0.1 Who maintains this document?

1. About Wget

1.0 What is Wget?
1.1 Where is the home page?
1.2 Where can I find documentation?
1.3 Where can I download Wget?
1.4 Where can I get help?

2. Installing Wget

2.0 How do I compile Wget?

3. Using Wget

3.0 How can I make Wget ignore a robots.txt file?
3.1 Does Wget support files larger than 2GB?
3.2 Does Wget support cookies?
3.3 How do I download a URL with funny characters in it?
3.4 Tool X lets me mirror a site, but Wget gives an HTTP error(404)?
3.5 Does Wget understand HTTP/1.1?
3.6 Does Wget understand javascript?
3.7 Is there a way to hide my clear-text user/pass combo from the process table?
3.8 Will Wget log into a web site using HTTP FORM authentication?

0.0 Where do I get this document?

You can download the latest version of this document at the official URL http://www.gnu.org/software/wget/faq.html.

0.1 Who maintains this document?

The maintainer of this document is Mauro Tortonesi (mauro at ferrara dot linux dot it). You are encouraged to email him with concerns regarding this FAQ. The original author of the GNU Wget FAQ is James Bouressa (james at bouressa dot com).

1.0 What is Wget?

GNU Wget is a network utility to retrieve files from the World Wide Web using HTTP and FTP, the two most widely used Internet protocols. It works non-interactively, so it can work in the background, after having logged off. The program supports recursive retrieval of web-authoring pages as well as FTP sites -- you can use Wget to make mirrors of archives and home pages or to travel the Web like a WWW robot.

1.1 Where is the home page?

You can find the official Wget homepage at this URL:

There are also other two homepages related to Wget:

1.2 Where can I find documentation?

You can:

1.3 Where can I download Wget?

Source Tarball:

Windows Binaries (kindly provided by Heiko Herold): An MS-DOS binary designed to be used under plain DOS with a packet driver has been made available by Doug Kaufman: VMS port by Antinode.org:

1.4 Where can I get help?

The main mailing list for end users is wget@sunsite.dk. You can subscribe by sending an email with a message body of "subscribe" to wget-subscribe@sunsite.dk. If you wish to post to the list, please be sure and include the complete output of your problem when using the -d flag with Wget. It will drastically improve the likelihood and quality of responses.

You can view the mailing list archives at http://www.mail-archive.com/wget%40sunsite.dk/

Info about other mailing lists can be found on the GNU Wget home page.

2.0 How do I compile Wget?

On most UNIX-like operating systems, this will work:

$ gzip -dc wget-1.10.tar.gz | tar -xvf -
$ cd wget-1.10
$ ./configure
$ make
# make install
If it doesn't, be sure to look at the README and INSTALL files that came with your distribution. You can also run configure with the "--help" flag to get more options.

3.0 How can I make Wget ignore the robots.txt file?

Try using:

wget -erobots=off http://your.site.here

3.1 Does Wget support files larger than 2GB?

Yes, starting from version 1.10, GNU Wget supports files larger than 2GB.

3.2 Does Wget support cookies?

Yes. You can load your Mozilla/Firefox cookie file using the --load-cookies option. Wget will accept cookies by default and save them if you specify --save-cookies. Also see the --keep-session-cookies option, which forces saving of session cookies to disk. See the documentation for details.

3.3 How do I download a URL with funny characters in it?

Try putting single or double quotes around the URL:

wget 'http://my.funny/$url&with%characters special;to|my#operating<system'
Or try substituting the funny character with a percent sign (%) and the character's ASCII HEX equivalent. So this URL:
wget 'http://my.funny/$url&with characters special;to|my#operating<system'
becomes:
wget 'http://my.funny/%24url%26with%20characters%20special%3Bto%7Bmy%23operating%3Csystem'

3.4 Tool X lets me mirror a site, but Wget gives an HTTP error (404)?

The server admin may be specifically denying the Wget user agent. Try changing the identification string to something else:

wget -m -U "Mozilla/5.0 (compatible; Konqueror/3.2; Linux)" http://some.web.site

3.5 Does Wget understand HTTP/1.1?

Wget is an HTTP/1.0 client. But, since the HTTP/1.1 protocol was designed to fully support HTTP/1.0 clients, Wget interoperates with any HTTP/1.1 compliant server.

In addition, Wget support several features introduced by HTTP/1.1 and used by many web servers, such as keep-alive connections and the Host header.

3.6 Does Wget understand JavaScript?

Wget doesn't feature JavaScript support and is not capable of performing recursive retrieval of URLs included in JavaScript code.

In fact, it is impossible to extract URLs from JavaScript by merely parsing it. Web clients need to actually execute it, and Wget can't do that since it's not a GUI browser.

However, some heuristics could be applied to JavaScript source code to extract URLs from certain often-encountered JavaScript patterns, such as rollover image changes. This feature will probably added in a later version of Wget.

3.7 Is there a way to hide my clear-text user/pass combo from the process table?

I don't think there is a portable and reliable way to prevent this. You can work around it, though. Put your URLs with passwords to a file and invoke Wget with `wget -i FILE'. Or use `wget -i -' and type the URL followed by ctrl-d. You may also be able to put this info in wgetrc.

3.8 Will Wget log into a web site using HTTP FORM authentication?

Later versions of Wget support posting of forms. Try:

wget --post-data="login=user&password=pw" http://www.yourclient.com/somepage.html