DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

(wget) Types of Files

Info Catalog (wget) Spanning Hosts (wget) Following Links (wget) Directory-Based Limits
 
 Types of Files
 ==============
 
    When downloading material from the web, you will often want to
 restrict the retrieval to only certain file types.  For example, if you
 are interested in downloading GIFs, you will not be overjoyed to get
 loads of PostScript documents, and vice versa.
 
    Wget offers two options to deal with this problem.  Each option
 description lists a short name, a long name, and the equivalent command
 in `.wgetrc'.
 
 `-A ACCLIST'
 `--accept ACCLIST'
 `accept = ACCLIST'
      The argument to `--accept' option is a list of file suffixes or
      patterns that Wget will download during recursive retrieval.  A
      suffix is the ending part of a file, and consists of "normal"
      letters, e.g. `gif' or `.jpg'.  A matching pattern contains
      shell-like wildcards, e.g. `books*' or `zelazny*196[0-9]*'.
 
      So, specifying `wget -A gif,jpg' will make Wget download only the
      files ending with `gif' or `jpg', i.e. GIFs and JPEGs.  On the
      other hand, `wget -A "zelazny*196[0-9]*"' will download only files
      beginning with `zelazny' and containing numbers from 1960 to 1969
      anywhere within.  Look up the manual of your shell for a
      description of how pattern matching works.
 
      Of course, any number of suffixes and patterns can be combined
      into a comma-separated list, and given as an argument to `-A'.
 
 `-R REJLIST'
 `--reject REJLIST'
 `reject = REJLIST'
      The `--reject' option works the same way as `--accept', only its
      logic is the reverse; Wget will download all files _except_ the
      ones matching the suffixes (or patterns) in the list.
 
      So, if you want to download a whole page except for the cumbersome
      MPEGs and .AU files, you can use `wget -R mpg,mpeg,au'.
      Analogously, to download all files except the ones beginning with
      `bjork', use `wget -R "bjork*"'.  The quotes are to prevent
      expansion by the shell.
 
    The `-A' and `-R' options may be combined to achieve even better
 fine-tuning of which files to retrieve.  E.g. `wget -A "*zelazny*" -R
 .ps' will download all the files having `zelazny' as a part of their
 name, but _not_ the PostScript files.
 
    Note that these two options do not affect the downloading of HTML
 files; Wget must load all the HTMLs to know where to go at
 all--recursive retrieval would make no sense otherwise.
 
Info Catalog (wget) Spanning Hosts (wget) Following Links (wget) Directory-Based Limits
automatically generated byinfo2html