httpget(1)

NAME

httpget - fetch documents using HTTP

SYNOPSIS

httpget [ options ] url

DESCRIPTION

httpget is a command-line-callable interface to the Hypertext Transfer Protocol, HTTP. It is intended for debugging, and for use in shell scripts which need to fetch documents using HTTP. It is not a web browser, although it might be used as one component in a web browser. (In particular, it makes no attempt at HTML parsing or formatting; that's somebody else's job. It acts something like "lynx -dump -source".)

Various option flags allow reasonably complete control over the auxiliary information sent in the HTTP request. It is also possible to specify the body of the request, i.e. when using the POST or PUT methods.

httpget is not a general-purpose URL fetcher; it knows nothing of ftp: or file: or mailto: or gopher: or telnet: URLs.

httpget also implements rudimentary cookie handling. It can optionally maintain a cookie file, storing cookies in it when they are received along with fetched pages, and/or sending those cookies back in future requests. (This is arguably higher-level processing than this low-level utility ought to be performing, but it's decidedly awkward for, say, the caller to try to do.)

The HTTP-fetching code used by httpget is also available in library form, for calling from C or other compiled programs.

OPTIONS

-accept str
Set the Accept string sent in the HTTP request.
-acceptenc str
Set the Accept-Encoding string sent in the HTTP request.
-agent str
Set the User-Agent string sent in the HTTP request.
-cookie str
Set a Cookie string to be sent in the HTTP request. (Several of these options may appear, up to a compile-time limit which is currently 5.)
-cookies
Send and receive cookies automatically. (Implies -sendcookies and -stashcookies.)
-cookiefile f
State file for storing cookies between invocations is f.
-f file
Read URLs to fetch from file (one per line), instead of the command line. If file is ``-'', read URLs from standard input.
-from str
Set the From string sent in the HTTP request.
-head
Fetch just the HTTP header (using a method of HEAD). Also implies -r.
-header key val
Set an arbitrary string in the HTTP request. (Several of these options may appear, up to a compile-time limit which is currently 5.)
-m method
Set the method parameter in the HTTP request. (The default is GET, unless the -head or -post flags appear.)
-o file
Write output to file.
-O
Write each fetched URL to a file whose name is derived from the last component of the URL. (Particularly useful when fetching multiple URLs.)
-password pass
Set a password to be sent in the HTTP request (using HTTP ``basic'' authentication).
-post
Post data: read from standard input until EOF, and pass this data as the body of the HTTP request. Also implies a default method of POST.
-persist
Use HTTP 1.1 persistent connections, reusing one TCP connection to fetch potentially multiple URLs from the same host. (See also -psock.)
(This feature is still under development; use with care.)
-psock f
Create f as a named socket so that a persistent connection can be saved between httpget invocations.
-preserve
If possible, preserve the modification time of the fetched object: if the server returns a Last-Modified: header, and if httpget is writing the fetched object to a file as a result of the -o or -O options, set the modtime of the file appropriately.
(This feature relies on an external time-processing library the presence of which makes httpget more difficult to build; consequently, this feature may not be present in all versions of the program.)
-r
(``raw'') Print received HTTP headers.
-rr
(``really raw'') Also print initial HTTP status line.
-referer str
Set the Referer string sent in the HTTP request.
-refetch
If a 301 or 302 redirect occurs, automatically fetch the referenced URL.
-sendcookies
Send cookies automatically.
-stashcookies
Receive cookies automatically.
-srcaddr w.x.y.z
Set the source address of the underlying TCP connection.
(This is an experimental feature.)
-t t
Set the timeout to t seconds.
(It is also possible to set different timeout for various phases of the connection: -tn sets the host name lookup timeout, -to sets the connection-open timeout, -tr sets the read timeout, and -tw sets the write timeout.)
-type str
Set the Content-Type string sent in the HTTP request.
-username name
Set a username to be sent in the HTTP request (using HTTP ``basic'' authentication).
-v
Verbose output.
-version
Print program's version number.
-?, -help
Display a brief help message.

BUGS

The -O option isn't very clever. There's no way to incorporate higher-level ``directory'' components of the fetched URLs in the output filenames (meaning that if you fetched, say, a/index.html and b/index.html, the output files would overwrite each other). If the fetched URL ends in a slash, -O doesn't know what to do. If one of several output files can't be created, httpget gives up, and doesn't even try fetching the rest.

When performing redirects in the presence of the -refetch option, a hop count is not kept, meaning that the redirects could loop forever.

Cookie handling is preliminary and imperfect. Cookies from a host are always sent back to that host, without performing checks on the path or port.

When fetching https: URLs using SSL, the authenticity of the server certificate is not checked.

The persistent connection implementation is preliminary and imperfect. Currently, the previous cocnnection is always reused, without checking to see whether it uses the same host or port. (Thus, persistent connections work acceptably when fetching a number of URLs from the same host, but would not work, say, in the presence of -refetch, if a redirect occurred referring to a different host.)

Persistent connections don't work properly with https: URLs.

AUTHOR

Steve Summit, scs@eskimo.com

SEE ALSO

lynx(1), GET(1), lwp-request, curl(1), wget(1)