section 5.10: Command-line Arguments

page 115

The picture at the top of page 115 doesn't quite match the declaration

	char *argv[]
it's actually a picture of the situation declared by
	char **argv
which is what main actually receives. (The array parameter declaration char *argv[] is rewritten by the compiler to char **argv, in accordance with the discussion in sections 5.3 and 5.8.) Also, the ``0'' at the bottom of the array is just a representation of the null pointer which conventionally terminates the argv array. (Normally, you'll never encounter the terminating null pointer, because if you think of argv as an array of size argc, you'll never access beyond argv[argc-1].)

The loop

	for (i = 1; i < argc; i++)
looks different from most loops we see in C (which either start at 0 and use <, or start at 1 and use <=). The reason is that we're skipping argv[0], which contains the name of the program.

The expression

	printf("%s%s", argv[i], (i < argc-1) ? " " : "");
is a little nicety to print a space after each word (to separate it from the next word) but not after the last word. (The nicety is just that the code doesn't print an extra space at the end of the line.) It would also be possible to fold in the following printf of the newline:
	printf("%s%s", argv[i], (i < argc-1) ? " " : "\n");

As I mentioned in comment on the bottom of page 109, it's not necessary to write pointer-incrementing code like

	while(--argc > 0)
		printf("%s%s", *++argv, (argc > 1) ? " " : "");
if you don't feel comfortable with it. I used to try to write code like this, because it seemed to be what everybody else did, but it never sat well, and it was always just a bit too hard to write and to prove correct. I've reverted to simple, obvious loops like
	int argi;
	char *sep = "";

for (argi = 1; argi < argc; argi++) { printf("%s%s", sep, argv[argi]); sep = " "; } printf("\n");
Often, it's handy to have the original argc and argv around later, anyway. (This loop also shows another way of handling space separators.)

page 116

Page 116 shows a simple improvement on the matching-lines program first presented on page 69; page 117 adds a few more improvements. The differences between page 69 and page 116 are that the pattern is read from the command line, and strstr is used instead of strindex. The difference between page 116 and page 117 is the handling of the -n and -x options. (The next obvious improvement, which we're not quite in a position to make yet, is to allow a file name to be specified on the command line, rather than always reading from the standard input.)

page 117

Several aspects of this code deserve note.

The line

	while (c = *++argv[0])
is not in error. (In isolation, it might look like an example of the classic error of accidentally writing = instead of == in a comparison.) What it's actually doing is another version of a combined set-and-test: it assigns the next character pointed to by argv[0] to c, and compares it against '\0'. You can't see the comparison against '\0', because it's implicit in the usual interpretation of a nonzero expression as ``true.'' An explicit test would look like this:
	while ((c = *++argv[0]) != '\0')
argv[0] is a pointer to a character in a string; ++argv[0] increments that pointer to point to the next character in the string; and *++argv[0] increments the pointer while returning the next character pointed to. argv[0] is not the first string on the command line, but rather whichever one we're looking at now, since elsewhere in the loop we increment argv itself.

Some of the extra complexity in this loop is to make sure that it can handle both

	-x -n
and
	-xn
In pseudocode, the option-parsing loop is
	for ( each word on the command line )
		if ( it begins with '-' )
			for ( each character c in that word )
				switch ( c )
					...
For comparison, here is another way of writing effectively the same loop:
	int argi;
	char *p;


for (argi = 1; argi < argc && argv[argi][0] == '-'; argi++) for (p = &argv[argi][1]; *p != '\0'; p++) switch (*p) { case 'x': ...
This uses array notation to access the words on the command line, but pointer notation to access the characters within a word (more specifically, a word that begins with '-'). We could also use array notation for both:
	int argi, chari;


for (argi = 1; argi < argc && argv[argi][0] == '-'; argi++) for (chari = 1; argv[argi][chari] != '\0'; chari++) switch (argv[argi][chari]) { case 'x': ...
In either case, the inner, character loop starts at the second character (index [1]), not the first, because the first character (index [0]) is the '-'.

It's easy to see how the -n option is implemented. If -n is seen, the number flag is set to 1 (a.k.a. ``true''), and later, in the line-matching loop, each time a line is printed, if the number flag is true, the line number is printed first. It's harder to see how -x works. An except flag is set to 1 if -x is present, but how is except used? It's buried down there in the line

	if ((strstr(line, *argv) != NULL) != except)
What does that mean? The subexpression
	(strstr(line, *argv) != NULL)
is 1 if the line contains the pattern, and 0 if it does not. except is 0 if we should print matching lines, and 1 if we should print non-matching lines. What we've actually implemented here is an ``exclusive OR,'' which is ``if A or B but not both.'' Other ways of writing this would be
	int matched = (strstr(line, *argv) != NULL);
	if (matched && !except || !matched && except) {
		if (number)
			printf("%ld:", lineno);
		printf("%s", line);
		found++;
	}
or
	int matched = (strstr(line, *argv) != NULL);
	if (except ? !matched : matched) {
		if (number)
			printf("%ld:", lineno);
		printf("%s", line);
		found++;
	}
or
	int matched = (strstr(line, *argv) != NULL);
	if (!except) {
		if (matched) {
			if (number)
				printf("%ld:", lineno);
			printf("%s", line);
			found++;
		}
	}
	else {
		if (!matched) {
			if (number)
				printf("%ld:", lineno);
			printf("%s", line);
			found++;
		}
	}
There's clearly a tradeoff: the last version is in some sense the most clear (and the most verbose), but it ends up repeating the line-number printing and any other processing which must be done for found lines. Therefore, the compressed, perhaps slightly more cryptic forms are better: some day, it's a virtual certainty that more processing will be added for printed lines (for example, if we're searching multiple files, we'll want to print the filename for matching lines, too), and if the printing is duplicated in two places, it's far too likely that we'll overlook that fact and add the new code in only one place.

One last point on the pattern-matching program: it's probably clearer to declare a pointer variable

	char *pat;
and set it to the word from argv to be used as the search pattern (argv[1] or *argv, depending on whether we're looking at page 116 or 117), and then use that in the call to strstr:
	if (strstr(line, pat) != NULL ...


Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995, 1996 // mail feedback