httpget(1)
NAME
httpget - fetch documents using HTTP
SYNOPSIS
httpget
[
options
]
url
DESCRIPTION
httpget is a command-line-callable interface to
the Hypertext Transfer Protocol, HTTP.
It is intended for debugging, and for use in shell scripts which
need to fetch documents using HTTP.
It is not a web browser,
although it might be used as one component in a web browser.
(In particular, it makes no attempt at HTML parsing or formatting;
that's somebody else's job.
It acts something like "lynx -dump -source".)
Various option flags allow reasonably complete control over the
auxiliary information sent in the HTTP request.
It is also possible to specify the body of the request,
i.e. when using the POST or PUT methods.
httpget is not a general-purpose URL fetcher;
it knows nothing of ftp: or file: or mailto: or gopher: or telnet: URLs.
httpget also implements rudimentary cookie handling.
It can optionally maintain a cookie file,
storing cookies in it when they are received along with fetched pages,
and/or sending those cookies back in future requests.
(This is arguably higher-level processing
than this low-level utility ought to be performing,
but it's decidedly awkward for, say, the caller to try to do.)
The HTTP-fetching code used by httpget
is also available in library form,
for calling from C or other compiled programs.
OPTIONS
- -accept str
- Set the Accept string sent in the HTTP request.
- -acceptenc str
- Set the Accept-Encoding string sent in the HTTP request.
- -agent str
- Set the User-Agent string sent in the HTTP request.
- -cookie str
- Set a Cookie string to be sent in the HTTP request.
(Several of these options may appear,
up to a compile-time limit which is currently 5.)
- -cookies
- Send and receive cookies automatically.
(Implies -sendcookies
and -stashcookies.)
- -cookiefile f
- State file for storing cookies between invocations is f.
- -f file
- Read URLs to fetch from file
(one per line), instead of the command line.
If file is ``-'',
read URLs from standard input.
- -from str
- Set the From string sent in the HTTP request.
- -head
- Fetch just the HTTP header
(using a method of HEAD).
Also implies -r.
- -header key val
- Set an arbitrary string in the HTTP request.
(Several of these options may appear,
up to a compile-time limit which is currently 5.)
- -m method
- Set the method parameter in the HTTP request.
(The default is GET,
unless the -head
or -post flags appear.)
- -o file
- Write output to file.
- -O
- Write each fetched URL to a file whose name is derived from the
last component of the URL.
(Particularly useful when fetching multiple URLs.)
- -password pass
- Set a password to be sent in the HTTP request
(using HTTP ``basic'' authentication).
- -post
- Post data: read from standard input until EOF,
and pass this data as the body of the HTTP request.
Also implies a default method of POST.
- -persist
- Use HTTP 1.1 persistent connections,
reusing one TCP connection to fetch potentially multiple URLs
from the same host.
(See also -psock.)
(This feature is still under development;
use with care.)
- -psock f
- Create f as a named socket
so that a persistent connection
can be saved between httpget invocations.
- -preserve
- If possible, preserve the modification time of the fetched object:
if the server returns a Last-Modified: header,
and if httpget is writing the fetched object to a file
as a result of the -o or -O options,
set the modtime of the file appropriately.
(This feature relies on an external time-processing library
the presence of which makes httpget more difficult to build;
consequently, this feature may not be present in all versions of the program.)
- -r
- (``raw'')
Print received HTTP headers.
- -rr
- (``really raw'')
Also print initial HTTP status line.
- -referer str
- Set the Referer string sent in the HTTP request.
- -refetch
- If a 301 or 302 redirect occurs,
automatically fetch the referenced URL.
- -sendcookies
- Send cookies automatically.
- -stashcookies
- Receive cookies automatically.
- -srcaddr w.x.y.z
- Set the source address of the underlying TCP connection.
(This is an experimental feature.)
- -t t
- Set the timeout to t seconds.
(It is also possible to set different timeout for various phases
of the connection:
-tn sets the host name lookup timeout,
-to sets the connection-open timeout,
-tr sets the read timeout,
and
-tw sets the write timeout.)
- -type str
- Set the Content-Type string sent in the HTTP request.
- -username name
- Set a username to be sent in the HTTP request
(using HTTP ``basic'' authentication).
- -v
- Verbose output.
- -version
- Print program's version number.
- -?, -help
- Display a brief help message.
BUGS
The -O option isn't very clever.
There's no way to incorporate higher-level ``directory'' components
of the fetched URLs
in the output filenames
(meaning that if you fetched, say, a/index.html and b/index.html,
the output files would overwrite each other).
If the fetched URL ends in a slash,
-O doesn't know what to do.
If one of several output files can't be created,
httpget gives up,
and doesn't even try fetching the rest.
When performing redirects
in the presence of the -refetch option,
a hop count is not kept,
meaning that the redirects could loop forever.
Cookie handling is preliminary and imperfect.
Cookies from a host are always sent back to that host,
without performing checks on the path or port.
When fetching https: URLs using SSL,
the authenticity of the server certificate is not checked.
The persistent connection implementation is preliminary and imperfect.
Currently, the previous cocnnection is always reused,
without checking to see whether it uses the same host or port.
(Thus, persistent connections work acceptably
when fetching a number of URLs from the same host,
but would not work, say,
in the presence of -refetch,
if a redirect occurred referring to a different host.)
Persistent connections don't work properly with https: URLs.
AUTHOR
Steve Summit,
scs@eskimo.com
SEE ALSO
lynx(1), GET(1), lwp-request, curl(1), wget(1)