WGET(1) | GNU Wget | WGET(1) |
NAME
Wget - The non-interactive network downloader.SYNOPSIS
wget [ option]... [URL]...DESCRIPTION
GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.OPTIONS
Option Syntax
Since Wget uses GNU getopt to process command-line arguments, every option has a long form along with the short one. Long options are more convenient to remember, but take time to type. You may freely mix different option styles, or specify options after the command-line arguments. Thus you may write:wget -r --tries=10 http://fly.srk.fer.hr/ -o log
wget -drc <URL>
wget -d -r -c <URL>
wget -o log -- -x
wget -X " -X /~nobody,/~somebody
Basic Startup Options
- -V
- --version
- Display the version of Wget.
- -h
- --help
- Print a help message describing all of Wget's command-line options.
- -b
- --background
- Go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log.
- -e command
- --execute command
- Execute command as if it were a part of .wgetrc. A command thus invoked will be executed after the commands in .wgetrc, thus taking precedence over them. If you need to specify more than one wgetrc command, use multiple instances of -e.
Logging and Input File Options
- -o logfile
- --output-file=logfile
- Log all messages to logfile. The messages are normally reported to standard error.
- -a logfile
- --append-output=logfile
- Append to logfile. This is the same as -o, only it appends to logfile instead of overwriting the old log file. If logfile does not exist, a new file is created.
- -d
- --debug
- Turn on debug output, meaning various information important to the developers of Wget if it does not work properly. Your system administrator may have chosen to compile Wget without debug support, in which case -d will not work. Please note that compiling with debug support is always safe---Wget compiled with the debug support will not print any debug info unless requested with -d.
- -q
- --quiet
- Turn off Wget's output.
- -v
- --verbose
- Turn on verbose output, with all the available data. The default output is verbose.
- -nv
- --no-verbose
- Turn off verbose without being completely quiet (use -q for that), which means that error messages and basic information still get printed.
- --report-speed=type
- Output bandwidth as type. The only accepted value is bits.
- -i file
- --input-file=file
-
Read URLs from a local or external file. If - is specified as file, URLs are read from the standard input. (Use ./- to read from a file literally named -.)
- --input-metalink=file
- Downloads files covered in local Metalink file. Metalink version 3 and 4 are supported.
- --metalink-over-http
- Issues HTTP HEAD request instead of GET and extracts Metalink metadata from response headers. Then it switches to Metalink download. If no valid Metalink metadata is found, it falls back to ordinary HTTP download.
- --preferred-location
- Set preferred location for Metalink resources. This has effect if multiple resources with same priority are available.
- -F
- --force-html
- When input is read from a file, force it to be treated as an HTML file. This enables you to retrieve relative links from existing HTML files on your local disk, by adding "<base href=" url">" to HTML, or using the --base command-line option.
- -B URL
- --base=URL
-
Resolves relative links using URL as the point of reference, when reading links from an HTML file specified via the -i/--input-file option (together with --force-html, or when the input file was fetched remotely from a server describing it as HTML). This is equivalent to the presence of a "BASE" tag in the HTML input file, with URL as the value for the "href" attribute.
- --config=FILE
- Specify the location of a startup file you wish to use.
- --rejected-log=logfile
- Logs all URL rejections to logfile as comma separated values. The values include the reason of rejection, the URL and the parent URL it was found in.
Download Options
- --bind-address=ADDRESS
- When making client TCP/IP connections, bind to ADDRESS on the local machine. ADDRESS may be specified as a hostname or IP address. This option can be useful if your machine is bound to multiple IPs.
- -t number
- --tries=number
- Set number of tries to number. Specify 0 or inf for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like "connection refused" or "not found" (404), which are not retried.
- -O file
- --output-document=file
-
The documents will not be written to the appropriate files, but all will be concatenated together and written to file. If - is used as file, documents will be printed to standard output, disabling link conversion. (Use ./- to print to a file literally named -.)
- -nc
- --no-clobber
-
If a file is downloaded more than once in the same directory, Wget's behavior depends on a few options, including -nc. In certain cases, the local file will be clobbered, or overwritten, upon repeated download. In other cases it will be preserved.
- --backups=backups
- Before (over)writing a file, back up an existing file by adding a .1 suffix (_1 on VMS) to the file name. Such backup files are rotated to .2, .3, and so on, up to backups (and lost beyond that).
- -c
- --continue
-
Continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program. For instance:
wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
- --start-pos=OFFSET
-
Start downloading at zero-based position OFFSET. Offset may be expressed in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix, etc.
- --progress=type
-
Select the type of the progress indicator you wish to use. Legal indicators are "dot" and "bar".
- --show-progress
-
Force wget to display the progress bar in any verbosity.
- -N
- --timestamping
- Turn on time-stamping.
- --no-if-modified-since
- Do not send If-Modified-Since header in -N mode. Send preliminary HEAD request instead. This has only effect in -N mode.
- --no-use-server-timestamps
-
Don't set the local file's timestamp by the one on the server.
- -S
- --server-response
- Print the headers sent by HTTP servers and responses sent by FTP servers.
- --spider
-
When invoked with this option, Wget will behave as a Web spider, which means that it will not download the pages, just check that they are there. For example, you can use Wget to check your bookmarks:
wget --spider --force-html -i bookmarks.html
- -T seconds
- --timeout=seconds
-
Set the network timeout to seconds seconds. This is equivalent to specifying --dns-timeout, --connect-timeout, and --read-timeout, all at the same time.
- --dns-timeout=seconds
- Set the DNS lookup timeout to seconds seconds. DNS lookups that don't complete within the specified time will fail. By default, there is no timeout on DNS lookups, other than that implemented by system libraries.
- --connect-timeout=seconds
- Set the connect timeout to seconds seconds. TCP connections that take longer to establish will be aborted. By default, there is no connect timeout, other than that implemented by system libraries.
- --read-timeout=seconds
-
Set the read (and write) timeout to seconds seconds. The "time" of this timeout refers to idle time: if, at any point in the download, no data is received for more than the specified number of seconds, reading fails and the download is restarted. This option does not directly affect the duration of the entire download.
- --limit-rate=amount
-
Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the k suffix, or megabytes with the m suffix. For example, --limit-rate=20k will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don't want Wget to consume the entire available bandwidth.
- -w seconds
- --wait=seconds
-
Wait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the "m" suffix, in hours using "h" suffix, or in days using "d" suffix.
- --waitretry=seconds
-
If you don't want Wget to wait between every retrieval, but only between retries of failed downloads, you can use this option. Wget will use linear backoff, waiting 1 second after the first failure on a given file, then waiting 2 seconds after the second failure on that file, up to the maximum number of seconds you specify.
- --random-wait
-
Some web sites may perform log analysis to identify retrieval programs such as Wget by looking for statistically significant similarities in the time between requests. This option causes the time between requests to vary between 0.5 and 1.5 * wait seconds, where wait was specified using the --wait option, in order to mask Wget's presence from such analysis.
- --no-proxy
- Don't use proxies, even if the appropriate *_proxy environment variable is defined.
- -Q quota
- --quota=quota
-
Specify download quota for automatic retrievals. The value can be specified in bytes (default), kilobytes (with k suffix), or megabytes (with m suffix).
- --no-dns-cache
-
Turn off caching of DNS lookups. Normally, Wget remembers the IP addresses it looked up from DNS so it doesn't have to repeatedly contact the DNS server for the same (typically small) set of hosts it retrieves from. This cache exists in memory only; a new Wget run will contact DNS again.
- --restrict-file-names=modes
-
Change which characters found in remote URLs must be escaped during generation of local filenames. Characters that are restricted by this option are escaped, i.e. replaced with %HH, where HH is the hexadecimal number that corresponds to the restricted character. This option may also be used to force all alphabetical cases to be either lower- or uppercase.
- -4
- --inet4-only
- -6
- --inet6-only
-
Force connecting to IPv4 or IPv6 addresses. With --inet4-only or -4, Wget will only connect to IPv4 hosts, ignoring AAAA records in DNS, and refusing to connect to IPv6 addresses specified in URLs. Conversely, with --inet6-only or -6, Wget will only connect to IPv6 hosts and ignore A records and IPv4 addresses.
- --prefer-family=none/IPv4/IPv6
-
When given a choice of several addresses, connect to the addresses with specified address family first. The address order returned by DNS is used without change by default.
- --retry-connrefused
- Consider "connection refused" a transient error and try again. Normally Wget gives up on a URL when it is unable to connect to the site because failure to connect is taken as a sign that the server is not running at all and that retries would not help. This option is for mirroring unreliable sites whose servers tend to disappear for short periods of time.
- --user=user
- --password=password
- Specify the username user and password password for both FTP and HTTP file retrieval. These parameters can be overridden using the --ftp-user and --ftp-password options for FTP connections and the --http-user and --http-password options for HTTP connections.
- --ask-password
- Prompt for a password for each connection established. Cannot be specified when --password is being used, because they are mutually exclusive.
- --no-iri
-
Turn off internationalized URI (IRI) support. Use --iri to turn it on. IRI support is activated by default.
- --local-encoding=encoding
-
Force Wget to use encoding as the default system encoding. That affects how Wget converts URLs specified as arguments from locale to UTF-8 for IRI support.
- --remote-encoding=encoding
-
Force Wget to use encoding as the default remote server encoding. That affects how Wget converts URIs found in files from remote encoding to UTF-8 during a recursive fetch. This options is only useful for IRI support, for the interpretation of non-ASCII characters.
- --unlink
- Force Wget to unlink file instead of clobbering existing file. This option is useful for downloading to the directory with hardlinks.
Directory Options
- -nd
- --no-directories
- Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the filenames will get extensions .n).
- -x
- --force-directories
- The opposite of -nd---create a hierarchy of directories, even if one would not have been created otherwise. E.g. wget -x http://fly.srk.fer.hr/robots.txt will save the downloaded file to fly.srk.fer.hr/robots.txt.
- -nH
- --no-host-directories
- Disable generation of host-prefixed directories. By default, invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.
- --protocol-directories
- Use the protocol name as a directory component of local file names. For example, with this option, wget -r http://host will save to http/host/... rather than just to host/....
- --cut-dirs=number
-
Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.
No options -> ftp.xemacs.org/pub/xemacs/
-nH -> pub/xemacs/
-nH --cut-dirs=1 -> xemacs/
-nH --cut-dirs=2 -> .
--cut-dirs=1 -> ftp.xemacs.org/xemacs/
...
- -P prefix
- --directory-prefix=prefix
- Set directory prefix to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree. The default is . (the current directory).
HTTP Options
- --default-page=name
- Use name as the default file name when it isn't known (i.e., for URLs that end in a slash), instead of index.html.
- -E
- --adjust-extension
-
If a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the regexp \.[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the local filename. This is useful, for instance, when you're mirroring a remote site that uses .asp pages, but you want the mirrored pages to be viewable on your stock Apache server. Another good use for this is when you're downloading CGI-generated materials. A URL like http://site.com/article.cgi?25 will be saved as article.cgi?25.html.
- --http-user=user
- --http-password=password
-
Specify the username user and password password on an HTTP server. According to the type of the challenge, Wget will encode them using either the "basic" (insecure), the "digest", or the Windows "NTLM" authentication scheme.
- --no-http-keep-alive
-
Turn off the "keep-alive" feature for HTTP downloads. Normally, Wget asks the server to keep the connection open so that, when you download more than one document from the same server, they get transferred over the same TCP connection. This saves time and at the same time reduces the load on the server.
- --no-cache
-
Disable server-side cache. In this case, Wget will send the remote server an appropriate directive ( Pragma: no-cache) to get the file from the remote service, rather than returning the cached version. This is especially useful for retrieving and flushing out-of-date documents on proxy servers.
- --no-cookies
- Disable the use of cookies. Cookies are a mechanism for maintaining server-side state. The server sends the client a cookie using the "Set-Cookie" header, and the client responds with the same cookie upon further requests. Since cookies allow the server owners to keep track of visitors and for sites to exchange this information, some consider them a breach of privacy. The default is to use cookies; however, storing cookies is not on by default.
- --load-cookies file
-
Load cookies from file before the first HTTP retrieval. file is a textual file in the format originally used by Netscape's cookies.txt file.
- "Netscape 4.x."
- The cookies are in ~/.netscape/cookies.txt.
- "Mozilla and Netscape 6.x."
- Mozilla's cookie file is also named cookies.txt, located somewhere under ~/.mozilla, in the directory of your profile. The full path usually ends up looking somewhat like ~/.mozilla/default/some-weird-string/cookies.txt.
- "Internet Explorer."
- You can produce a cookie file Wget can use by using the File menu, Import and Export, Export Cookies. This has been tested with Internet Explorer 5; it is not guaranteed to work with earlier versions.
- "Other browsers."
- If you are using a different browser to create your cookies, --load-cookies will only work if you can locate or produce a cookie file in the Netscape format that Wget expects.
wget --no-cookies --header "Cookie: <name>=<value>"
- --save-cookies file
- Save cookies to file before exiting. This will not save cookies that have expired or that have no expiry time (so-called "session cookies"), but also see --keep-session-cookies.
- --keep-session-cookies
-
When specified, causes --save-cookies to also save session cookies. Session cookies are normally not saved because they are meant to be kept in memory and forgotten when you exit the browser. Saving them is useful on sites that require you to log in or to visit the home page before you can access some pages. With this option, multiple Wget runs are considered a single browser session as far as the site is concerned.
- --ignore-length
-
Unfortunately, some HTTP servers (CGI programs, to be more precise) send out bogus "Content-Length" headers, which makes Wget go wild, as it thinks not all the document was retrieved. You can spot this syndrome if Wget retries getting the same document again and again, each time claiming that the (otherwise normal) connection has closed on the very same byte.
- --header=header-line
-
Send header-line along with the rest of the headers in each HTTP request. The supplied header is sent as-is, which means it must contain name and value separated by colon, and must not contain newlines.
wget --header='Accept-Charset: iso-8859-2' \
--header='Accept-Language: hr' \
http://fly.srk.fer.hr/
wget --header="Host: foo.bar" http://localhost/
- --max-redirect=number
- Specifies the maximum number of redirections to follow for a resource. The default is 20, which is usually far more than necessary. However, on those occasions where you want to allow more (or fewer), this is the option to use.
- --proxy-user=user
- --proxy-password=password
-
Specify the username user and password password for authentication on a proxy server. Wget will encode them using the "basic" authentication scheme.
- --referer=url
- Include `Referer: url' header in HTTP request. Useful for retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers and only come out properly when Referer is set to one of the pages that point to them.
- --save-headers
- Save the headers sent by the HTTP server to the file, preceding the actual contents, with an empty line as the separator.
- -U agent-string
- --user-agent=agent-string
-
Identify as agent-string to the HTTP server.
- --post-data=string
- --post-file=file
-
Use POST as the method for all HTTP requests and send the specified data in the request body. --post-data sends string as data, whereas --post-file sends the contents of file. Other than that, they work in exactly the same way. In particular, they both expect content of the form "key1=value1&key2=value2", with percent-encoding for special characters; the only difference is that one expects its content as a command-line parameter and the other accepts its content from a file. In particular, --post-file is not for transmitting files as form attachments: those must appear as "key=value" data (with appropriate percent-coding) just like everything else. Wget does not currently support "multipart/form-data" for transmitting POST data; only "application/x-www-form-urlencoded". Only one of --post-data and --post-file should be specified.
# Log in to the server. This can be done only once.
wget --save-cookies cookies.txt \
--post-data 'user=foo&password=bar' \
http://server.com/auth.php
# Now grab the page or pages we care about.
wget --load-cookies cookies.txt \
-p http://server.com/interesting/article.php
- --method=HTTP-Method
- For the purpose of RESTful scripting, Wget allows sending of other HTTP Methods without the need to explicitly set them using --header=Header-Line. Wget will use whatever string is passed to it after --method as the HTTP Method to the server.
- --body-data=Data-String
- --body-file=Data-File
-
Must be set when additional data needs to be sent to the server along with the Method specified using --method. --body-data sends string as data, whereas --body-file sends the contents of file. Other than that, they work in exactly the same way.
- --content-disposition
-
If this is set to on, experimental (not fully-functional) support for "Content-Disposition" headers is enabled. This can currently result in extra round-trips to the server for a "HEAD" request, and is known to suffer from a few bugs, which is why it is not currently enabled by default.
- --content-on-error
- If this is set to on, wget will not skip the content when the server responds with a http status code that indicates error.
- --trust-server-names
- If this is set to on, on a redirect the last component of the redirection URL will be used as the local file name. By default it is used the last component in the original URL.
- --auth-no-challenge
-
If this option is given, Wget will send Basic HTTP authentication information (plaintext username and password) for all requests, just like Wget 1.10.2 and prior did by default.
HTTPS (SSL/TLS) Options
To support encrypted HTTP (HTTPS) downloads, Wget must be compiled with an external SSL library. The current default is GnuTLS. In addition, Wget also supports HSTS (HTTP Strict Transport Security). If Wget is compiled without SSL support, none of these options are available.- --secure-protocol=protocol
-
Choose the secure protocol to be used. Legal values are auto, SSLv2, SSLv3, TLSv1, TLSv1_1, TLSv1_2 and PFS. If auto is used, the SSL library is given the liberty of choosing the appropriate protocol automatically, which is achieved by sending a TLSv1 greeting. This is the default.
- --https-only
- When in recursive mode, only HTTPS links are followed.
- --no-check-certificate
-
Don't check the server certificate against the available certificate authorities. Also don't require the URL host name to match the common name presented by the certificate.
- --certificate=file
- Use the client certificate stored in file. This is needed for servers that are configured to require certificates from the clients that connect to them. Normally a certificate is not required and this switch is optional.
- --certificate-type=type
- Specify the type of the client certificate. Legal values are PEM (assumed by default) and DER, also known as ASN1.
- --private-key=file
- Read the private key from file. This allows you to provide the private key in a file separate from the certificate.
- --private-key-type=type
- Specify the type of the private key. Accepted values are PEM (the default) and DER.
- --ca-certificate=file
-
Use file as the file with the bundle of certificate authorities ("CA") to verify the peers. The certificates must be in PEM format.
- --ca-directory=directory
-
Specifies directory containing CA certificates in PEM format. Each file contains one CA certificate, and the file name is based on a hash value derived from the certificate. This is achieved by processing a certificate directory with the "c_rehash" utility supplied with OpenSSL. Using --ca-directory is more efficient than --ca-certificate when many certificates are installed because it allows Wget to fetch certificates on demand.
- --crl-file=file
- Specifies a CRL file in file. This is needed for certificates that have been revocated by the CAs.
- --random-file=file
-
[OpenSSL and LibreSSL only] Use file as the source of random data for seeding the pseudo-random number generator on systems without /dev/urandom.
- --egd-file=file
-
[OpenSSL only] Use file as the EGD socket. EGD stands for Entropy Gathering Daemon, a user-space program that collects data from various unpredictable system sources and makes it available to other programs that might need it. Encryption software, such as the SSL library, needs sources of non-repeating randomness to seed the random number generator used to produce cryptographically strong keys.
- --no-hsts
- Wget supports HSTS (HTTP Strict Transport Security, RFC 6797) by default. Use --no-hsts to make Wget act as a non-HSTS-compliant UA. As a consequence, Wget would ignore all the "Strict-Transport-Security" headers, and would not enforce any existing HSTS policy.
- --hsts-file=file
-
By default, Wget stores its HSTS database in ~/.wget-hsts. You can use --hsts-file to override this. Wget will use the supplied file as the HSTS database. Such file must conform to the correct HSTS database format used by Wget. If Wget cannot parse the provided file, the behaviour is unspecified.
- --warc-file=file
- Use file as the destination WARC file.
- --warc-header=string
- Use string into as the warcinfo record.
- --warc-max-size=size
- Set the maximum size of the WARC files to size.
- --warc-cdx
- Write CDX index files.
- --warc-dedup=file
- Do not store records listed in this CDX file.
- --no-warc-compression
- Do not compress WARC files with GZIP.
- --no-warc-digests
- Do not calculate SHA1 digests.
- --no-warc-keep-log
- Do not store the log file in a WARC record.
- --warc-tempdir=dir
- Specify the location for temporary files created by the WARC writer.
FTP Options
- --ftp-user=user
- --ftp-password=password
-
Specify the username user and password password on an FTP server. Without this, or the corresponding startup option, the password defaults to -wget@, normally used for anonymous FTP.
- --no-remove-listing
-
Don't remove the temporary .listing files generated by FTP retrievals. Normally, these files contain the raw directory listings received from FTP servers. Not removing them can be useful for debugging purposes, or when you want to be able to easily check on the contents of remote server directories (e.g. to verify that a mirror you're running is complete).
- --no-glob
-
Turn off FTP globbing. Globbing refers to the use of shell-like special characters ( wildcards), like *, ?, [ and ] to retrieve more than one file from the same directory at once, like:
wget ftp://gnjilux.srk.fer.hr/*.msg
- --no-passive-ftp
-
Disable the use of the passive FTP transfer mode. Passive FTP mandates that the client connect to the server to establish the data connection rather than the other way around.
- --preserve-permissions
- Preserve remote file permissions instead of permissions set by umask.
- --retr-symlinks
-
By default, when retrieving FTP directories recursively and a symbolic link is encountered, the symbolic link is traversed and the pointed-to files are retrieved. Currently, Wget does not traverse symbolic links to directories to download them recursively, though this feature may be added in the future.
FTPS Options
- --ftps-implicit
- This option tells Wget to use FTPS implicitly. Implicit FTPS consists of initializing SSL/TLS from the very beginning of the control connection. This option does not send an "AUTH TLS" command: it assumes the server speaks FTPS and directly starts an SSL/TLS connection. If the attempt is successful, the session continues just like regular FTPS ("PBSZ" and "PROT" are sent, etc.). Implicit FTPS is no longer a requirement for FTPS implementations, and thus many servers may not support it. If --ftps-implicit is passed and no explicit port number specified, the default port for implicit FTPS, 990, will be used, instead of the default port for the "normal" (explicit) FTPS which is the same as that of FTP, 21.
- --no-ftps-resume-ssl
- Do not resume the SSL/TLS session in the data channel. When starting a data connection, Wget tries to resume the SSL/TLS session previously started in the control connection. SSL/TLS session resumption avoids performing an entirely new handshake by reusing the SSL/TLS parameters of a previous session. Typically, the FTPS servers want it that way, so Wget does this by default. Under rare circumstances however, one might want to start an entirely new SSL/TLS session in every data connection. This is what --no-ftps-resume-ssl is for.
- --ftps-clear-data-connection
- All the data connections will be in plain text. Only the control connection will be under SSL/TLS. Wget will send a "PROT C" command to achieve this, which must be approved by the server.
- --ftps-fallback-to-ftp
- Fall back to FTP if FTPS is not supported by the target server. For security reasons, this option is not asserted by default. The default behaviour is to exit with an error. If a server does not successfully reply to the initial "AUTH TLS" command, or in the case of implicit FTPS, if the initial SSL/TLS connection attempt is rejected, it is considered that such server does not support FTPS.
Recursive Retrieval Options
- -r
- --recursive
- Turn on recursive retrieving. The default maximum depth is 5.
- -l depth
- --level=depth
- Specify recursion maximum depth level depth.
- --delete-after
-
This option tells Wget to delete every single file it downloads, after having done so. It is useful for pre-fetching popular pages through a proxy, e.g.:
wget -r -nd --delete-after http://whatever.com/~popular/page/
- -k
- --convert-links
-
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.
- •
-
The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.
- •
-
The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.
- --convert-file-only
-
This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term here in order not to cause confusion.
- -K
- --backup-converted
- When converting a file, back up the original version with a .orig suffix. Affects the behavior of -N.
- -m
- --mirror
- Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.
- -p
- --page-requisites
-
This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.
wget -r -l 2 http://<site>/1.html
wget -r -l 2 -p http://<site>/1.html
wget -r -l 1 -p http://<site>/1.html
wget -r -l 0 -p http://<site>/1.html
wget -p http://<site>/1.html
wget -E -H -k -K -p http://<site>/<document>
- --strict-comments
-
Turn on strict parsing of HTML comments. The default is to terminate comments at the first occurrence of -->.
Recursive Accept/Reject Options
- -A acclist --accept acclist
- -R rejlist --reject rejlist
- Specify comma-separated lists of file name suffixes or patterns to accept or reject. Note that if any of the wildcard characters, *, ?, [ or ], appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in -A "*.mp3" or -A '*.mp3'.
- --accept-regex urlregex
- --reject-regex urlregex
- Specify a regular expression to accept or reject the complete URL.
- --regex-type regextype
- Specify the regular expression type. Possible types are posix or pcre. Note that to be able to use pcre type, wget has to be compiled with libpcre support.
- -D domain-list
- --domains=domain-list
- Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H.
- --exclude-domains domain-list
- Specify the domains that are not to be followed.
- --follow-ftp
- Follow FTP links from HTML documents. Without this option, Wget will ignore all the FTP links.
- --follow-tags=list
- Wget has an internal table of HTML tag / attribute pairs that it considers when looking for linked documents during a recursive retrieval. If a user wants only a subset of those tags to be considered, however, he or she should be specify such tags in a comma-separated list with this option.
- --ignore-tags=list
-
This is the opposite of the --follow-tags option. To skip certain HTML tags when recursively looking for documents to download, specify them in a comma-separated list.
wget --ignore-tags=a,area -H -k -K -r http://<site>/<document>
- --ignore-case
- Ignore case when matching files and directories. This influences the behavior of -R, -A, -I, and -X options, as well as globbing implemented when downloading from FTP sites. For example, with this option, -A "*.txt" will match file1.txt, but also file2.TXT, file3.TxT, and so on. The quotes in the example are to prevent the shell from expanding the pattern.
- -H
- --span-hosts
- Enable spanning across hosts when doing recursive retrieving.
- -L
- --relative
- Follow relative links only. Useful for retrieving a specific home page without any distractions, not even those from the same hosts.
- -I list
- --include-directories=list
- Specify a comma-separated list of directories you wish to follow when downloading. Elements of list may contain wildcards.
- -X list
- --exclude-directories=list
- Specify a comma-separated list of directories you wish to exclude from download. Elements of list may contain wildcards.
- -np
- --no-parent
- Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
ENVIRONMENT
Wget supports proxies for both HTTP and FTP retrievals. The standard way to specify proxy location, which Wget recognizes, is using the following environment variables:- http_proxy
- https_proxy
- If set, the http_proxy and https_proxy variables should contain the URLs of the proxies for HTTP and HTTPS connections respectively.
- ftp_proxy
- This variable should contain the URL of the proxy for FTP connections. It is quite common that http_proxy and ftp_proxy are set to the same URL.
- no_proxy
- This variable should contain a comma-separated list of domain extensions proxy should not be used for. For instance, if the value of no_proxy is .mit.edu, proxy will not be used to retrieve documents from MIT.
EXIT STATUS
Wget may return one of several error codes if it encounters problems.- 0
- No problems occurred.
- 1
- Generic error code.
- 2
- Parse error---for instance, when parsing command-line options, the .wgetrc or .netrc...
- 3
- File I/O error.
- 4
- Network failure.
- 5
- SSL verification failure.
- 6
- Username/password authentication failure.
- 7
- Protocol errors.
- 8
- Server issued an error response.
BUGS
You are welcome to submit bug reports via the GNU Wget bug tracker (see < http://wget.addictivecode.org/BugTracker>).- 1.
- Please try to ascertain that the behavior you see really is a bug. If Wget crashes, it's a bug. If Wget does not behave as documented, it's a bug. If things work strange, but you are not sure about the way they are supposed to work, it might well be a bug, but you might want to double-check the documentation and the mailing lists.
- 2.
-
Try to repeat the bug in as simple circumstances as possible. E.g. if Wget crashes while downloading wget -rl0 -kKE -t5 --no-proxy http://yoyodyne.com -o /tmp/log, you should try to see if the crash is repeatable, and if will occur with a simpler set of options. You might even try to start the download at the page where the crash occurred to see if that page somehow triggered the crash.
- 3.
-
Please start Wget with -d option and send us the resulting output (or relevant parts thereof). If Wget was compiled without debug support, recompile it---it is much easier to trace bugs with debug support on.
- 4.
- If Wget has crashed, try to run it in a debugger, e.g. "gdb `which wget` core" and type "where" to get the backtrace. This may not work if the system administrator has disabled core files, but it is safe to try.
SEE ALSO
This is not the complete manual for GNU Wget. For more complete information, including more detailed explanations of some of the options, and a number of commands available for use with .wgetrc files and the -e option, see the GNU Info entry for wget.AUTHOR
Originally written by Hrvoje Nikšić <hniksic@xemacs.org>.COPYRIGHT
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2015 Free Software Foundation, Inc.2016-03-28 | GNU Wget 1.17.1 |