DBGREP(1)

NAME

dbgrep - extract and process records from simple text databases

SYNOPSIS

dbgrep [ options ] expression [ dbfile ]

DESCRIPTION

dbgrep manipulates simple text databases in various ways. Its preferred ``database'' format consists of a series of keys and values with the reasonably obvious forms

	key1 value11
	key2 value21

	key1 value12
	key2 value22

	key1: value13
	key2: value23

	key1: value14
	key2: value24

In other words, the first word on a line (maybe followed by a colon) is a key; the rest of the line is that key's value. Blank lines separate records. The program discovers whether explicit colons are being used or not. Alternatively, it is possible to explicitly specify that keys and values are separated by colons, single tabs, or arbitrary whitespace by using the -cs, -ts, or -ws options, respectively. (Under -cs and -ts, a key name may contain spaces, and under -ts and -ws, keys may contain colons.) It is also possible to specify that new records begin on occurrence of a specific key, as opposed to a blank line. Databases may also contain comments, which are lines beginning with #, and which are not otherwise interpreted.

Given the similarity of the colon-separated form to an RFC822 mail header, it is possible for the program to deal with Unix-style mailbox files: it knows about lines beginning with the five characters ``From '' and preceded by a blank line, it knows about header line continuation, and it knows about body text (not containing explicit keys, and separated from the header by a blank line). In other words, each message in the mailbox is treated as a ``record'', and its header lines (or body text, using -e) can be searched upon. Mailbox mode is selected with the -mail (or -mailh) option.

There is also some support for ``databases'' represented as tabular files. The first line of such a database is taken to be a header describing the field names, and the remaining lines are interpreted as records, one per line. Lines can be formatted as tab-separated, comma-separated, or ``SQL output format''. Columnar input or output is selected using the -ifmt and -ofmt options.

As its name implies, dbgrep's original purpose in life was to select records matching certain patterns. Along the way, however, it has accumulated a number of other processing options which have turned it into more of a general-purpose database processing tool and report generator.

dbgrep's command line syntax is, perhaps unfortunately, modeled on that of find(1). That is, the ``expression'' describing the records to be matched, and the operators specifying the actions to be carried out for matched records, are all specified as command-line options. (Once you get used to it, though, this sort of syntax isn't really all so painful as find(1)'s man page's ``BUGS'' section would lead you to believe.) It is also possible to prepare the search and processing expression (or a subexpression) in a file, and have dbgrep read it from there.

dbgrep's basic invocation syntax is

	dbgrep [options] [pattern | expression] [dbfile]

If the dbfile is omitted, input is naturally read from standard input. If a single, simple pattern is present, it is treated as a regexp (à la grep(1)) to be searched for in any field; this is the same as -e. Otherwise, the expression is a series of match operators, perhaps with Boolean connectors (-o, -a, !, and ( ) for grouping); it may also contain operators (again like find(1)) which cause some particular action to be taken on a matched record. Finally, there is the usual assortment of options. (Actually, the options may be interspersed with the expression operators, since one parser scans them all.)

By default, most of dbgrep's match operators use regular expressions, in the style of ed(1) and grep(1). It is possible to disable the use of regular expressions, either in an individual match (by using alternative match operators such as -km or -kx), or on a global basis by using the -m or -x options.

The options, match operators, etc. (as listed by dbgrep -help) are:

EXPRESSIONS

(match expression operators)

-e pat: Match pat in any field.
-expr e: Evaluate ``secondary'' expression e and continue if true. (See ``SECONDARY EXPRESSIONS'' below.)
-k key pat: Match pat in field key.
-km key pat: Match pat anywhere in field key, but no regexp.
-kn key n pat: Match pat against the n'th occurrence of key.
-ka keys pat ;: Match pat against any of the listed keys. keys is a list of one or more field names, separated by whitespace, followed by the pattern, followed by a semicolon (which must typically be escaped from the shell).
-kx key pat: Match pat exactly in field key (no substring, no regexp).
-kxa key pats ;: Match (exactly) any of the listed pats in field key. pats is a list of one or more patterns, separated by whitespace, terminated by a semicolon (which must typically be escaped from the shell).
-kxaf key file: Match (exactly) any of a list of patterns in field key, where the patterns are read (one per line) from file.
-kr keypat pat: Match pat in any field whose key matches the regexp keypat.
-ke key: True if field key exists.
-ker keypat: True if field matching the regexp keypat exists.
-true: Always true.

(Boolean expression connectors)

-a: and (also implied by adjacent expressions)
-o: or
!: not
( ): grouping (must typically be entered as $ and $ to protect from the shell)

OUTPUT PROCESSING

-ak key val: Append new key and val.
-ake key expr: Append new key with value of secondary expression expr.
-c: Print count of matching records.
-dk key: Delete key (and its value).
-ek key: Extract key (and its value), suppressing all others. (Multiple -ek options may be used to extract several keys.)
-exec cmd: Invoke shell command cmd with matched record as input. (Unix only)
-exprp e: Print result of evaluating secondary expression e.
-mof filepat: Write the output to multiple numbered files, one per record. The file names are as specified by filepat, where the two characters %d appearing in filepat are replaced by the output record number. (Or, if you want to get crazy, any of printf(3)'s integer formats could be used instead of %d.)
-mofk fpat key: Write the output to multiple files, one per record. The file names are as specified by fpat, where the two characters %s appearing in fpat are replaced by the value of the field key of the record being written.
-mofka fpat key: Like -mofk, but append to a file if it already exists (i.e. if multiple records have the same value for key).
-pkv key: Print only key's value.
-pkvn key n: Print only the n'th occurrence of key.
-pkvsh key: Print ``key=value''.
-print: Print matching record. (This is the default action.)
-rgp pat: Generate report from pat (see ``REPORT GENERATION'' section below).
-rgf file: Generate report from skeleton in file.
-sk key val: Set (existing) key to val.
-ska key val: Set key to val (appending if key not already present).
-ske key e: Set (existing) key to value of secondary expression e.
-writef file: Write record to file.

The -kn and -pkvn operators reflect the (possibly surprising) fact that in the database scheme used by dbgrep it is possible to have, within one record, multiple values with the same key.

The -expr, -exprp, -ake, and -ske operators are not present in all versions of the program; they depend on a separate expression evaluator which may or may not be available. See also the ``SECONDARY EXPRESSIONS'' section below.

OPTIONS

-cc c: Set database comment character to c. (The default is #; use ``none'' to disable.)
-contin, -hc: Allow continuation lines: a line beginning with whitespace is taken as a continuation of the previous line's value.
-cs: Force colon separator between keys and values. (Keys may therefore contain whitespace.)
-f file: Read match and processing operators from file, exactly as if typed on the command line (except that the expression may be spread across multiple lines for readability, and the characters (, ), and ; do not need to be quoted). Shell-style quote characters (", ', and \) may be used when patterns or other values contain whitespace. Expression files may contain comments, which are an unquoted # through end-of-line. (If -f is combined with a surrounding command-line match expression, you will probably want parentheses around -f and its filename.)
-i: Ignore case in all matched values (-k, etc.).
-ifmt f: Set input format. The default is ``dbf'', for the normal ``database'' format. Other possibilities are ``ts'' for tab-separated columns, ``csv'' for comma-separated values, or ``sql'' for the format typically output by SQL interface tools (where the first line gives the column names and the second contains sequences of dashes suggesting the column widths).
-ki: Ignore case in key names. (With -ki in effect, ``-k key val'' would also search in keys named ``Key'' and ``KEY''.)
-m: Perform simple substring matches; do not treat patterns as regular expressions. (This option modifies the behavior of any -e, -k, and -kr operators that follow it.)
-mail: Read mailbox format, treating each message as a record. Messages (records) begin with the five characters ``From '' preceded by a blank line. The message body (separated from the header by a blank line) is treated as a series of keyless fields (i.e. with values only). Also implies -ki and -hc.
-mailh key: Like -mail, but the ``header'' tag indicating the start of a new message is key (as opposed to ``From '').
-of file: Write all output to file, instead of the default standard output.
-ofmt f: Set output format f (see -ifmt for options).
-pc: Preserve comment lines, passing them through to the output unchanged.
-pi: Preserve ``indentation'', that is, if multiple whitespace characters appear between a key and its value, retain them all in the output.
-s: No output; exit status only.
-sc c: Set the character separating keys from values to c (instead of a colon, or whitespace).
-sepkey key: On input, begin a new record whenever key is seen (as opposed to the default, which is that new records are signaled by blank lines).
-ts: Force tab separation between keys and values; don't look for colons or arbitrary whitespace. (Implies -pi.)
-v: Invert; print records not matching. (Theoretically equivalent to putting ! ( ... ) around the match expression.)
-version: Print program's version number.
-ws: Force whitespace separation between keys and values; don't look for colons.
-x: Perform most matches exactly, neither looking for substrings nor matching regular expressions. (This option modifies the behavior of any -e, -k, and -kr operators that follow it.)
-?, -help: Print a usage summary.

REPORT GENERATION

The -rgp and -rgf options request report generation. Output is generated based on a template, which is supplied either directly on the command line (with -rgp) or in a file (specified by -rgf). The template is repeated for as many (selected) records as there are to be output. The template contains text which is to be output verbatim, interspersed with values to be interpolated from the database, plus a few other processing options. Interpolated values and other processing options are requested by sequences beginning with a $ character. The available $ sequences are:

$key: insert key's value
${key}: insert key's value (esp. if the key name contains spaces or punctuation)
$key[n], ${key}[n]: insert n'th of key's several values
$$: literal $
\$, \}, \\: literal characters
\\n: (backslash at end of line) eat newline; join lines
$?key{...}: conditionally include bracketed text only if key exists
$?{key}{...}: ditto
$!key{...}: conditionally include bracketed text only if key does not exist
$!{key}{...}: ditto
$*key{...}: repeat bracketed text once for each of key's multiple values
$*{key}{...}: ditto
$#key: number of key's values
$.: count (i) during $*
$%{e}: value of secondary expression e
(only if expression evaluation available)
$?%{e}{...}: conditionally include bracketed text only if secondary expression e is true
(and only if expression evaluation is available)

The $key[n], $#, $*, and $. constructions again reflect the fact that it is possible for one record to have multiple values with the same key.

SECONDARY EXPRESSIONS

The -expr, -exprp, -ake, and -ske operators, if available, and the report generation sequences $% and $?% support a simple arithmetic expression evaluator implementing the usual arithmetic operators plus a certain number of math and string functions. Briefly, the arithmetic operators are +, -, *, /, %, and ** (where % is modulus and ** is exponentiation), with the customary associativity and precedence. Parentheses may be used to override the default precedence. The relational and logical operators are >=, >, <=, <, ==, !=, !, &&, and || (all as in C).

These mathematical functions are available: abs, acos, asin, atan, atan2, ceil, cos, cosh, floor, ln, log10, sin, sinh, sqrt, tan, and tanh. There are also some string functions: strcat, strlen, strmatch, strstr, and substr. (Briefly: strcat and strlen are more or less as in C; strmatch matches regular expressions; strstr and substr use 1-based character positions.)

The expression evaluator shares code with med; the man page for med (q.v.) contains further documentation on these operators and functions.

EXAMPLES

	dbgrep -k a b

print records where field a contains b

	dbgrep -kx a b

print records where field a is exactly b

	dbgrep -k a b -k c d

print records where field a contains b and c contains d

	dbgrep -k a b -o -k c d

print records where field a contains b or c contains d

	dbgrep -ke a

print records where field a exists at all

	dbgrep -expr 'a > 5'

print records where field a's value is greater than 5

	dbgrep -expr 'a + b > 5'

print records where the sum of fields a and b is greater than 5

	dbgrep -c -k a b

print a count of records where field a contains b

	dbgrep -ek a

print just the records with key a

	dbgrep -ak a b

append a new record with key a and value b

	dbgrep -ake c 'a + b'

append a new record with key c containing the sum of fields a and b

For dealing with mailbox files, typical invocations are

	dbgrep -mail -k from user

print all messages from user

	dbgrep -mail -k subject 'blah blah'

print all messages with given subject

	dbgrep -mail -kx message-id '<msgid>'

extract message with given message-id

	dbgrep -mail -e word

print all messages containing word anywhere (header or body)

	dbgrep -mail -dk received

delete all Received: lines in header(s)

(Note that mail header keys are case-insensitive; -mail implies -ki.)

In -mail mode, the program is significantly less efficient if the -e operator is used, because then it has to process the bodies of messages as well as their headers.

BUGS

If a record has no field k, -pkv k prints nothing; it should arguably print a blank line.

There's no -ekn option analogous to -pkvn.

The interaction between the -m and -x options and the -k, -kr, and -e operators can be confusing; it's not clear that -m and -x should exist given that -km and -kx also exist. The placement of the -m and -x options on the command line is significant; they modify the behavior of only the -k, -kr, and -e options which follow. (That is, a -m or -x at the end of the option/expression list won't do anything, except perhaps to affect the behavior of an implicit, single-pattern search.)

The -ka operator performs simple substring matches (analogous to -km); it should perform regex matches like -k, and (as long as the rest do) be modified by the -m and -x options.

The distinction between the ``regular'' match expression on the command line, and the ``secondary'' expressions accepted by e.g. -expr and -ske is confusing.

The various -ifmt and -ofmt options have not been exhaustively tested. The `sql' format has been implemented only for input, not output.

Report generation (-rgp and -rgf) should probably be a separate program.

HISTORY

This documentation corresponds to version 2.9 of the program.

See http://www.eskimo.com/~scs/src/#dbgrep for possible updates.

AUTHOR

Steve Summit, scs@eskimo.com