Info: (wget) HTTP Options

Info Catalog
wget: Directory Options
wget: Invoking
wget: HTTPS (SSL/TLS) Options
wget: HTTP Options

 
 2.7 HTTP Options
 ================
 
 ‘--default-page=NAME’
      Use NAME as the default file name when it isn’t known (i.e., for
      URLs that end in a slash), instead of ‘index.html’.
 
 ‘-E’
 ‘--adjust-extension’
      If a file of type ‘application/xhtml+xml’ or ‘text/html’ is
      downloaded and the URL does not end with the regexp
      ‘\.[Hh][Tt][Mm][Ll]?’, this option will cause the suffix ‘.html’ to
      be appended to the local filename.  This is useful, for instance,
      when you’re mirroring a remote site that uses ‘.asp’ pages, but you
      want the mirrored pages to be viewable on your stock Apache server.
      Another good use for this is when you’re downloading CGI-generated
      materials.  A URL like ‘http://site.com/article.cgi?25’ will be
      saved as ‘article.cgi?25.html’.
 
      Note that filenames changed in this way will be re-downloaded every
      time you re-mirror a site, because Wget can’t tell that the local
      ‘X.html’ file corresponds to remote URL ‘X’ (since it doesn’t yet
      know that the URL produces output of type ‘text/html’ or
      ‘application/xhtml+xml’.
 
      As of version 1.12, Wget will also ensure that any downloaded files
      of type ‘text/css’ end in the suffix ‘.css’, and the option was
      renamed from ‘--html-extension’, to better reflect its new
      behavior.  The old option name is still acceptable, but should now
      be considered deprecated.
 
      As of version 1.19.2, Wget will also ensure that any downloaded
      files with a ‘Content-Encoding’ of ‘br’, ‘compress’, ‘deflate’ or
      ‘gzip’ end in the suffix ‘.br’, ‘.Z’, ‘.zlib’ and ‘.gz’
      respectively.
 
      At some point in the future, this option may well be expanded to
      include suffixes for other types of content, including content
      types that are not parsed by Wget.
 
 ‘--http-user=USER’
 ‘--http-password=PASSWORD’
      Specify the username USER and password PASSWORD on an HTTP server.
      According to the type of the challenge, Wget will encode them using
      either the ‘basic’ (insecure), the ‘digest’, or the Windows ‘NTLM’
      authentication scheme.
 
      Another way to specify username and password is in the URL itself
      (⇒URL Format).  Either method reveals your password to
      anyone who bothers to run ‘ps’.  To prevent the passwords from
      being seen, use the ‘--use-askpass’ or store them in ‘.wgetrc’ or
      ‘.netrc’, and make sure to protect those files from other users
      with ‘chmod’.  If the passwords are really important, do not leave
      them lying in those files either—edit the files and delete them
      after Wget has started the download.
 
 ‘--no-http-keep-alive’
      Turn off the “keep-alive” feature for HTTP downloads.  Normally,
      Wget asks the server to keep the connection open so that, when you
      download more than one document from the same server, they get
      transferred over the same TCP connection.  This saves time and at
      the same time reduces the load on the server.
 
      This option is useful when, for some reason, persistent
      (keep-alive) connections don’t work for you, for example due to a
      server bug or due to the inability of server-side scripts to cope
      with the connections.
 
 ‘--no-cache’
      Disable server-side cache.  In this case, Wget will send the remote
      server appropriate directives (‘Cache-Control: no-cache’ and
      ‘Pragma: no-cache’) to get the file from the remote service, rather
      than returning the cached version.  This is especially useful for
      retrieving and flushing out-of-date documents on proxy servers.
 
      Caching is allowed by default.
 
 ‘--no-cookies’
      Disable the use of cookies.  Cookies are a mechanism for
      maintaining server-side state.  The server sends the client a
      cookie using the ‘Set-Cookie’ header, and the client responds with
      the same cookie upon further requests.  Since cookies allow the
      server owners to keep track of visitors and for sites to exchange
      this information, some consider them a breach of privacy.  The
      default is to use cookies; however, _storing_ cookies is not on by
      default.
 
 ‘--load-cookies FILE’
      Load cookies from FILE before the first HTTP retrieval.  FILE is a
      textual file in the format originally used by Netscape’s
      ‘cookies.txt’ file.
 
      You will typically use this option when mirroring sites that
      require that you be logged in to access some or all of their
      content.  The login process typically works by the web server
      issuing an HTTP cookie upon receiving and verifying your
      credentials.  The cookie is then resent by the browser when
      accessing that part of the site, and so proves your identity.
 
      Mirroring such a site requires Wget to send the same cookies your
      browser sends when communicating with the site.  This is achieved
      by ‘--load-cookies’—simply point Wget to the location of the
      ‘cookies.txt’ file, and it will send the same cookies your browser
      would send in the same situation.  Different browsers keep textual
      cookie files in different locations:
 
      Netscape 4.x.
           The cookies are in ‘~/.netscape/cookies.txt’.
 
      Mozilla and Netscape 6.x.
           Mozilla’s cookie file is also named ‘cookies.txt’, located
           somewhere under ‘~/.mozilla’, in the directory of your
           profile.  The full path usually ends up looking somewhat like
           ‘~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt’.
 
      Internet Explorer.
           You can produce a cookie file Wget can use by using the File
           menu, Import and Export, Export Cookies.  This has been tested
           with Internet Explorer 5; it is not guaranteed to work with
           earlier versions.
 
      Other browsers.
           If you are using a different browser to create your cookies,
           ‘--load-cookies’ will only work if you can locate or produce a
           cookie file in the Netscape format that Wget expects.
 
      If you cannot use ‘--load-cookies’, there might still be an
      alternative.  If your browser supports a “cookie manager”, you can
      use it to view the cookies used when accessing the site you’re
      mirroring.  Write down the name and value of the cookie, and
      manually instruct Wget to send those cookies, bypassing the
      “official” cookie support:
 
           wget --no-cookies --header "Cookie: NAME=VALUE"
 
 ‘--save-cookies FILE’
      Save cookies to FILE before exiting.  This will not save cookies
      that have expired or that have no expiry time (so-called “session
      cookies”), but also see ‘--keep-session-cookies’.
 
 ‘--keep-session-cookies’
      When specified, causes ‘--save-cookies’ to also save session
      cookies.  Session cookies are normally not saved because they are
      meant to be kept in memory and forgotten when you exit the browser.
      Saving them is useful on sites that require you to log in or to
      visit the home page before you can access some pages.  With this
      option, multiple Wget runs are considered a single browser session
      as far as the site is concerned.
 
      Since the cookie file format does not normally carry session
      cookies, Wget marks them with an expiry timestamp of 0.  Wget’s
      ‘--load-cookies’ recognizes those as session cookies, but it might
      confuse other browsers.  Also note that cookies so loaded will be
      treated as other session cookies, which means that if you want
      ‘--save-cookies’ to preserve them again, you must use
      ‘--keep-session-cookies’ again.
 
 ‘--ignore-length’
      Unfortunately, some HTTP servers (CGI programs, to be more precise)
      send out bogus ‘Content-Length’ headers, which makes Wget go wild,
      as it thinks not all the document was retrieved.  You can spot this
      syndrome if Wget retries getting the same document again and again,
      each time claiming that the (otherwise normal) connection has
      closed on the very same byte.
 
      With this option, Wget will ignore the ‘Content-Length’ header—as
      if it never existed.
 
 ‘--header=HEADER-LINE’
      Send HEADER-LINE along with the rest of the headers in each HTTP
      request.  The supplied header is sent as-is, which means it must
      contain name and value separated by colon, and must not contain
      newlines.
 
      You may define more than one additional header by specifying
      ‘--header’ more than once.
 
           wget --header='Accept-Charset: iso-8859-2' \
                --header='Accept-Language: hr'        \
                  http://fly.srk.fer.hr/
 
      Specification of an empty string as the header value will clear all
      previous user-defined headers.
 
      As of Wget 1.10, this option can be used to override headers
      otherwise generated automatically.  This example instructs Wget to
      connect to localhost, but to specify ‘foo.bar’ in the ‘Host’
      header:
 
           wget --header="Host: foo.bar" http://localhost/
 
      In versions of Wget prior to 1.10 such use of ‘--header’ caused
      sending of duplicate headers.
 
 ‘--compression=TYPE’
      Choose the type of compression to be used.  Legal values are
      ‘auto’, ‘gzip’ and ‘none’.
 
      If ‘auto’ or ‘gzip’ are specified, Wget asks the server to compress
      the file using the gzip compression format.  If the server
      compresses the file and responds with the ‘Content-Encoding’ header
      field set appropriately, the file will be decompressed
      automatically.
 
      If ‘none’ is specified, wget will not ask the server to compress
      the file and will not decompress any server responses.  This is the
      default.
 
      Compression support is currently experimental.  In case it is
      turned on, please report any bugs to ‘bug-wget@gnu.org’.
 
 ‘--max-redirect=NUMBER’
      Specifies the maximum number of redirections to follow for a
      resource.  The default is 20, which is usually far more than
      necessary.  However, on those occasions where you want to allow
      more (or fewer), this is the option to use.
 
 ‘--proxy-user=USER’
 ‘--proxy-password=PASSWORD’
      Specify the username USER and password PASSWORD for authentication
      on a proxy server.  Wget will encode them using the ‘basic’
      authentication scheme.
 
      Security considerations similar to those with ‘--http-password’
      pertain here as well.
 
 ‘--referer=URL’
      Include ‘Referer: URL’ header in HTTP request.  Useful for
      retrieving documents with server-side processing that assume they
      are always being retrieved by interactive web browsers and only
      come out properly when Referer is set to one of the pages that
      point to them.
 
 ‘--save-headers’
      Save the headers sent by the HTTP server to the file, preceding the
      actual contents, with an empty line as the separator.
 
 ‘-U AGENT-STRING’
 ‘--user-agent=AGENT-STRING’
      Identify as AGENT-STRING to the HTTP server.
 
      The HTTP protocol allows the clients to identify themselves using a
      ‘User-Agent’ header field.  This enables distinguishing the WWW
      software, usually for statistical purposes or for tracing of
      protocol violations.  Wget normally identifies as ‘Wget/VERSION’,
      VERSION being the current version number of Wget.
 
      However, some sites have been known to impose the policy of
      tailoring the output according to the ‘User-Agent’-supplied
      information.  While this is not such a bad idea in theory, it has
      been abused by servers denying information to clients other than
      (historically) Netscape or, more frequently, Microsoft Internet
      Explorer.  This option allows you to change the ‘User-Agent’ line
      issued by Wget.  Use of this option is discouraged, unless you
      really know what you are doing.
 
      Specifying empty user agent with ‘--user-agent=""’ instructs Wget
      not to send the ‘User-Agent’ header in HTTP requests.
 
 ‘--post-data=STRING’
 ‘--post-file=FILE’
      Use POST as the method for all HTTP requests and send the specified
      data in the request body.  ‘--post-data’ sends STRING as data,
      whereas ‘--post-file’ sends the contents of FILE.  Other than that,
      they work in exactly the same way.  In particular, they _both_
      expect content of the form ‘key1=value1&key2=value2’, with
      percent-encoding for special characters; the only difference is
      that one expects its content as a command-line parameter and the
      other accepts its content from a file.  In particular,
      ‘--post-file’ is _not_ for transmitting files as form attachments:
      those must appear as ‘key=value’ data (with appropriate
      percent-coding) just like everything else.  Wget does not currently
      support ‘multipart/form-data’ for transmitting POST data; only
      ‘application/x-www-form-urlencoded’.  Only one of ‘--post-data’ and
      ‘--post-file’ should be specified.
 
      Please note that wget does not require the content to be of the
      form ‘key1=value1&key2=value2’, and neither does it test for it.
      Wget will simply transmit whatever data is provided to it.  Most
      servers however expect the POST data to be in the above format when
      processing HTML Forms.
 
      When sending a POST request using the ‘--post-file’ option, Wget
      treats the file as a binary file and will send every character in
      the POST request without stripping trailing newline or formfeed
      characters.  Any other control characters in the text will also be
      sent as-is in the POST request.
 
      Please be aware that Wget needs to know the size of the POST data
      in advance.  Therefore the argument to ‘--post-file’ must be a
      regular file; specifying a FIFO or something like ‘/dev/stdin’
      won’t work.  It’s not quite clear how to work around this
      limitation inherent in HTTP/1.0.  Although HTTP/1.1 introduces
      “chunked” transfer that doesn’t require knowing the request length
      in advance, a client can’t use chunked unless it knows it’s talking
      to an HTTP/1.1 server.  And it can’t know that until it receives a
      response, which in turn requires the request to have been completed
      – a chicken-and-egg problem.
 
      Note: As of version 1.15 if Wget is redirected after the POST
      request is completed, its behaviour will depend on the response
      code returned by the server.  In case of a 301 Moved Permanently,
      302 Moved Temporarily or 307 Temporary Redirect, Wget will, in
      accordance with RFC2616, continue to send a POST request.  In case
      a server wants the client to change the Request method upon
      redirection, it should send a 303 See Other response code.
 
      This example shows how to log in to a server using POST and then
      proceed to download the desired pages, presumably only accessible
      to authorized users:
 
           # Log in to the server.  This can be done only once.
           wget --save-cookies cookies.txt \
                --post-data 'user=foo&password=bar' \
                http://example.com/auth.php
 
           # Now grab the page or pages we care about.
           wget --load-cookies cookies.txt \
                -p http://example.com/interesting/article.php
 
      If the server is using session cookies to track user
      authentication, the above will not work because ‘--save-cookies’
      will not save them (and neither will browsers) and the
      ‘cookies.txt’ file will be empty.  In that case use
      ‘--keep-session-cookies’ along with ‘--save-cookies’ to force
      saving of session cookies.
 
 ‘--method=HTTP-METHOD’
      For the purpose of RESTful scripting, Wget allows sending of other
      HTTP Methods without the need to explicitly set them using
      ‘--header=Header-Line’.  Wget will use whatever string is passed to
      it after ‘--method’ as the HTTP Method to the server.
 
 ‘--body-data=DATA-STRING’
 ‘--body-file=DATA-FILE’
      Must be set when additional data needs to be sent to the server
      along with the Method specified using ‘--method’.  ‘--body-data’
      sends STRING as data, whereas ‘--body-file’ sends the contents of
      FILE.  Other than that, they work in exactly the same way.
 
      Currently, ‘--body-file’ is _not_ for transmitting files as a
      whole.  Wget does not currently support ‘multipart/form-data’ for
      transmitting data; only ‘application/x-www-form-urlencoded’.  In
      the future, this may be changed so that wget sends the
      ‘--body-file’ as a complete file instead of sending its contents to
      the server.  Please be aware that Wget needs to know the contents
      of BODY Data in advance, and hence the argument to ‘--body-file’
      should be a regular file.  See ‘--post-file’ for a more detailed
      explanation.  Only one of ‘--body-data’ and ‘--body-file’ should be
      specified.
 
      If Wget is redirected after the request is completed, Wget will
      suspend the current method and send a GET request till the
      redirection is completed.  This is true for all redirection
      response codes except 307 Temporary Redirect which is used to
      explicitly specify that the request method should _not_ change.
      Another exception is when the method is set to ‘POST’, in which
      case the redirection rules specified under ‘--post-data’ are
      followed.
 
 ‘--content-disposition’
 
      If this is set to on, experimental (not fully-functional) support
      for ‘Content-Disposition’ headers is enabled.  This can currently
      result in extra round-trips to the server for a ‘HEAD’ request, and
      is known to suffer from a few bugs, which is why it is not
      currently enabled by default.
 
      This option is useful for some file-downloading CGI programs that
      use ‘Content-Disposition’ headers to describe what the name of a
      downloaded file should be.
 
      When combined with ‘--metalink-over-http’ and
      ‘--trust-server-names’, a ‘Content-Type: application/metalink4+xml’
      file is named using the ‘Content-Disposition’ filename field, if
      available.
 
 ‘--content-on-error’
 
      If this is set to on, wget will not skip the content when the
      server responds with a http status code that indicates error.
 
 ‘--trust-server-names’
 
      If this is set, on a redirect, the local file name will be based on
      the redirection URL. By default the local file name is based on the
      original URL. When doing recursive retrieving this can be helpful
      because in many web sites redirected URLs correspond to an
      underlying file structure, while link URLs do not.
 
 ‘--auth-no-challenge’
 
      If this option is given, Wget will send Basic HTTP authentication
      information (plaintext username and password) for all requests,
      just like Wget 1.10.2 and prior did by default.
 
      Use of this option is not recommended, and is intended only to
      support some few obscure servers, which never send HTTP
      authentication challenges, but accept unsolicited auth info, say,
      in addition to form-based authentication.
 
 ‘--retry-on-host-error’
      Consider host errors, such as “Temporary failure in name
      resolution”, as non-fatal, transient errors.
 
 ‘--retry-on-http-error=CODE[,CODE,...]’
      Consider given HTTP response codes as non-fatal, transient errors.
      Supply a comma-separated list of 3-digit HTTP response codes as
      argument.  Useful to work around special circumstances where
      retries are required, but the server responds with an error code
      normally not retried by Wget.  Such errors might be 503 (Service
      Unavailable) and 429 (Too Many Requests).  Retries enabled by this
      option are performed subject to the normal retry timing and retry
      count limitations of Wget.
 
      Using this option is intended to support special use cases only and
      is generally not recommended, as it can force retries even in cases
      where the server is actually trying to decrease its load.  Please
      use wisely and only if you know what you are doing.
Info Catalog
wget: Directory Options
wget: Invoking
wget: HTTPS (SSL/TLS) Options