dbgrep manipulates simple text databases in various ways. Its preferred ``database'' format consists of a series of keys and values with the reasonably obvious forms
key1 value11 key2 value21 key1 value12 key2 value22or
key1: value13 key2: value23 key1: value14 key2: value24In other words, the first word on a line (maybe followed by a colon) is a key; the rest of the line is that key's value. Blank lines separate records. The program discovers whether explicit colons are being used or not. Alternatively, it is possible to explicitly specify that keys and values are separated by colons, single tabs, or arbitrary whitespace by using the -cs, -ts, or -ws options, respectively. (Under -cs and -ts, a key name may contain spaces, and under -ts and -ws, keys may contain colons.) It is also possible to specify that new records begin on occurrence of a specific key, as opposed to a blank line. Databases may also contain comments, which are lines beginning with #, and which are not otherwise interpreted.
Given the similarity of the colon-separated form to an RFC822 mail header, it is possible for the program to deal with Unix-style mailbox files: it knows about lines beginning with the five characters ``From '' and preceded by a blank line, it knows about header line continuation, and it knows about body text (not containing explicit keys, and separated from the header by a blank line). In other words, each message in the mailbox is treated as a ``record'', and its header lines (or body text, using -e) can be searched upon. Mailbox mode is selected with the -mail (or -mailh) option.
There is also some support for ``databases'' represented as tabular files. The first line of such a database is taken to be a header describing the field names, and the remaining lines are interpreted as records, one per line. Lines can be formatted as tab-separated, comma-separated, or ``SQL output format''. Columnar input or output is selected using the -ifmt and -ofmt options.
As its name implies, dbgrep's original purpose in life was to select records matching certain patterns. Along the way, however, it has accumulated a number of other processing options which have turned it into more of a general-purpose database processing tool and report generator.
dbgrep's command line syntax is, perhaps unfortunately, modeled on that of find(1). That is, the ``expression'' describing the records to be matched, and the operators specifying the actions to be carried out for matched records, are all specified as command-line options. (Once you get used to it, though, this sort of syntax isn't really all so painful as find(1)'s man page's ``BUGS'' section would lead you to believe.) It is also possible to prepare the search and processing expression (or a subexpression) in a file, and have dbgrep read it from there.
dbgrep's basic invocation syntax is
dbgrep [options] [pattern | expression] [dbfile]If the dbfile is omitted, input is naturally read from standard input. If a single, simple pattern is present, it is treated as a regexp (à la grep(1)) to be searched for in any field; this is the same as -e. Otherwise, the expression is a series of match operators, perhaps with Boolean connectors (-o, -a, !, and ( ) for grouping); it may also contain operators (again like find(1)) which cause some particular action to be taken on a matched record. Finally, there is the usual assortment of options. (Actually, the options may be interspersed with the expression operators, since one parser scans them all.)
By default, most of dbgrep's match operators use regular expressions, in the style of ed(1) and grep(1). It is possible to disable the use of regular expressions, either in an individual match (by using alternative match operators such as -km or -kx), or on a global basis by using the -m or -x options.
The options, match operators, etc. (as listed by dbgrep -help) are:
(match expression operators)
(Boolean expression connectors)
The -kn and -pkvn operators reflect the (possibly surprising) fact that in the database scheme used by dbgrep it is possible to have, within one record, multiple values with the same key.
The -expr, -exprp, -ake, and -ske operators are not present in all versions of the program; they depend on a separate expression evaluator which may or may not be available. See also the ``SECONDARY EXPRESSIONS'' section below.
The -rgp and -rgf options request report generation. Output is generated based on a template, which is supplied either directly on the command line (with -rgp) or in a file (specified by -rgf). The template is repeated for as many (selected) records as there are to be output. The template contains text which is to be output verbatim, interspersed with values to be interpolated from the database, plus a few other processing options. Interpolated values and other processing options are requested by sequences beginning with a $ character. The available $ sequences are:
The $key[n], $#, $*, and $. constructions again reflect the fact that it is possible for one record to have multiple values with the same key.
The -expr, -exprp, -ake, and -ske operators, if available, and the report generation sequences $% and $?% support a simple arithmetic expression evaluator implementing the usual arithmetic operators plus a certain number of math and string functions. Briefly, the arithmetic operators are +, -, *, /, %, and ** (where % is modulus and ** is exponentiation), with the customary associativity and precedence. Parentheses may be used to override the default precedence. The relational and logical operators are >=, >, <=, <, ==, !=, !, &&, and || (all as in C).
These mathematical functions are available: abs, acos, asin, atan, atan2, ceil, cos, cosh, floor, ln, log10, sin, sinh, sqrt, tan, and tanh. There are also some string functions: strcat, strlen, strmatch, strstr, and substr. (Briefly: strcat and strlen are more or less as in C; strmatch matches regular expressions; strstr and substr use 1-based character positions.)
The expression evaluator shares code with med; the man page for med (q.v.) contains further documentation on these operators and functions.
dbgrep -k a bprint records where field a contains b
dbgrep -kx a bprint records where field a is exactly b
dbgrep -k a b -k c dprint records where field a contains b and c contains d
dbgrep -k a b -o -k c dprint records where field a contains b or c contains d
dbgrep -ke aprint records where field a exists at all
dbgrep -expr 'a > 5'print records where field a's value is greater than 5
dbgrep -expr 'a + b > 5'print records where the sum of fields a and b is greater than 5
dbgrep -c -k a bprint a count of records where field a contains b
dbgrep -ek aprint just the records with key a
dbgrep -ak a bappend a new record with key a and value b
dbgrep -ake c 'a + b'append a new record with key c containing the sum of fields a and b
For dealing with mailbox files, typical invocations are
dbgrep -mail -k from userprint all messages from user
dbgrep -mail -k subject 'blah blah'print all messages with given subject
dbgrep -mail -kx message-id '<msgid>'extract message with given message-id
dbgrep -mail -e wordprint all messages containing word anywhere (header or body)
dbgrep -mail -dk receiveddelete all Received: lines in header(s)
(Note that mail header keys are case-insensitive; -mail implies -ki.)
In -mail mode, the program is significantly less efficient if the -e operator is used, because then it has to process the bodies of messages as well as their headers.
If a record has no field k, -pkv k prints nothing; it should arguably print a blank line.
There's no -ekn option analogous to -pkvn.
The interaction between the -m and -x options and the -k, -kr, and -e operators can be confusing; it's not clear that -m and -x should exist given that -km and -kx also exist. The placement of the -m and -x options on the command line is significant; they modify the behavior of only the -k, -kr, and -e options which follow. (That is, a -m or -x at the end of the option/expression list won't do anything, except perhaps to affect the behavior of an implicit, single-pattern search.)
The -ka operator performs simple substring matches (analogous to -km); it should perform regex matches like -k, and (as long as the rest do) be modified by the -m and -x options.
The distinction between the ``regular'' match expression on the command line, and the ``secondary'' expressions accepted by e.g. -expr and -ske is confusing.
The various -ifmt and -ofmt options have not been exhaustively tested. The `sql' format has been implemented only for input, not output.
Report generation (-rgp and -rgf) should probably be a separate program.
This documentation corresponds to version 2.9 of the program.
See http://www.eskimo.com/~scs/src/#dbgrep for possible updates.
Steve Summit, scs@eskimo.com