section 5.6: Pointer Arrays; Pointers to Pointers

page 107

Deep sentence:

Since pointers are variables themselves, they can be stored in arrays just as other variables can.

This is just one aspect of the generality of C's data types, which we'll be seeing in the next few sections.

We've used a recursive definition of ``expression'': a constant or variable is an expression, an expression in parentheses is an expression, an expression plus an expression is an expression, etc. There are obviously an infinite number of expressions, of arbitrary complexity. In exactly the same way, there are an infinite number of data types in C. We've already seen the basic data types: int, char, double, etc. But then we have the derived data types such as array-of-char and pointer-to-int and function-returning-double. So we can say that for any type, array-of-type is another type, and pointer-to-type is another type, and function-returning-type is another type. Once we've said that, we can see that there is also the possibility of arrays of pointers, and arrays of arrays, and functions returning pointers, and even (in section 5.11, though this is a deeper topic) pointers to functions. (The only possibilities that C doesn't support are functions returning arrays, and arrays of functions, and functions returning functions.)

Make sure you understand why an integer is something that can be ``compared or moved in a single operation,'' but that a string (that is, an array of char) is not. Then, realize that a pointer is also something that can be ``compared or moved in a single operation.'' (Actually, though, the string comparisons we'll be doing are not single operations.) From time to time you'll hear me caution you not to worry too much about certain aspects of efficiency. Here, it's true that the overhead of copying entire strings from one place to another, a character at a time (which is the overhead we'll be getting around by manipulating pointers instead) can be significant, but that's not the only concern: once we're comfortable with the idea, manipulating pointers will be somewhat easier on us, too. (Copying lots of characters around is a nuisance, and it can also be dangerous, if the destination isn't big enough or isn't in the right place.)

Don't worry about the ``one long character array'' that the ``lines to be sorted are stored end-to-end in.'' Instead, look at the picture at the bottom of page 107, which shows the pointers that might be set up after reading the lines

	defghi
	jklmnopqrst
	abc

On the left are the pointers before sorting, and on the right are the pointers after sorting. The three strings have not been moved, but by reshuffling the pointers, the three pointers in order now point to the lines

	abc
	defghi
	jklmnopqrst

page 108

Once again, we see a nice simple decomposition of the problem, which might seem deceptively simple except that when problems are decomposed in simple ways like this, and then implemented faithfully, they really can be this simple. Deferring the sorting step is an excellent idea, especially if we didn't quite follow the details of the sorting functions in the previous chapter. (Actually, in practice, we can usually defer the sorting step forever, since there's often a general-purpose sort routine provided for us somewhere. C is no exception: a qsort function is a required part of its standard library. For the most part, the only people who have to write sort routines are programming students and the few people who get stuck implementing system functions.)

The main program at the bottom of page 108 looks a bit more elaborate than the pseudocode at the top of the page, but the essence of the program is the three calls to readlines, qsort, and writelines. Everything else is declarations, plus an error message which is printed if readlines is for some reason not able to read the input. Eventually, you should be able to understand why all of the various declarations are required, but you can skim over them at first.

page 109

The readlines function first calls our old friend getline to read each line into a local array, line. On page 29 in section 1.9, we saw a program for finding the longest line in the input: it read each line into a local array line, and kept a copy of the longest line in a second array longest. In that program, it didn't matter that the input array line was continually overwritten with each new input line, and that most lines (except the longest one) were lost and forgotten. Here, however, we do need to save all of the input lines somewhere, so that we can sort them and print them later.

The lines are saved by calling alloc, a function which we wrote in section 5.4 but may have skimmed over. alloc allocates n bytes of new memory for something which we need to save. Each time we read another line, we call alloc to allocate some new memory to store it, then call strcpy to copy the line from the line array to the newly allocated memory. This way, it's okay that the next line is read into the same line array; we save each line, as it's read, into its own little alloc'ed piece of memory.

Note that memory allocated with a routine such as alloc persists, just as global and static variables do; it does not disappear when the function that allocated it returns.

Hopefully you're getting used to reading compressed condition statements by now, because here's another doozy:

	if (nlines >= maxlines || (p == alloc(len)) == NULL)

This line checks to make sure we have enough room to store the new line we just read. We need two things: (1) a slot in the lineptr array to store the pointer, and (2) space allocated by alloc to store the line itself. If we don't have either of these things, we return -1, indicating that we ran out of memory. We don't have a slot in the lineptr array if we've already read maxlines lines, and we don't have room to store the line itself if alloc returns NULL. The subexpression (p = alloc(len)) == NULL is equivalent in form to other assign-and-test combinations we've been using involving getchar and getline: it assigns alloc's return value to p, then compares it to NULL.

Normally, we might be suspicious of the call alloc(len). Why? Remember that strings are always terminated by '\0', so the space required to store a string is always one more than the number of characters in it. Normally, we'll call things like alloc(len + 1), and accidentally calling alloc(len) is usually a bug. Here, it happens to be okay, because before we copy the line to the newly-allocated memory, we strip the newline '\n' from the end of it, by overwriting it with '\0', hence making the string one shorter than len. (Why is the last character in line, namely the '\n', at line[len-1], and not line[len]?)

The fragments

	if (nlines >= maxlines ...
and
	lineptr[nlines++] = p;

deserve some attention. These represent a common way of filling in an array in C. nlines always holds the number of lines we've read so far (it's another invariant). It starts out as 0 (we haven't read any lines yet) and it ends up as the total number of lines we've read. Each time we read a new line, we store the line (more precisely, a pointer to it) in lineptr[nlines++]. By using postfix ++, we store the pointer in the slot indexed by the previous value of nlines, which is what we want, because arrays are 0-based in C. The first time through the loop, nlines is 0, so we store a pointer to the first line in lineptr[0], and then increment nlines to 1. If nlines ever becomes equal to maxlines, we've filled in all the slots of the array, and we can't use any more (even though, at that point, the highest-filled cell in the array is lineptr[maxlines-1], which is the last cell in the array, again because arrays are 0-based). We test for this condition by checking nlines >= maxlines, as a little measure of paranoia. The test nlines == maxlines would also work, but if we ever accidentally introduce a bug into the program such that we fill past the last slot without noticing it, we wouldn't want to keep on filling farther and farther past the end.

Deep sentences:

...lineptr is an array of MAXLINES elements, each element of which is a pointer to a char. That is, lineptr[i] is a character pointer...

We can see that lineptr[i] has to be a character pointer, by looking at two things: in the function readlines, the line

	lineptr[lines++] = p;

has a character pointer on the right-hand side, and the only thing we can assign a character pointer to is another character pointer. Also, in the function writelines, in the line

	printf("%s\n", lineptr[i]);

printf's %s format expects a pointer to a character, so that's what lineptr[i] had better be.

Note that writelines prints a newline after each line, since newlines were stripped out of the input lines by readlines.

Don't worry too much about the discussion at the bottom of page 109. We saw in section 5.3 that due to the ``strong relationship'' between pointers and arrays, it is always possible to manipulate an array using pointer-like notation, and to manipulate a pointer using array-like notation. Since lineptr is an array, it is possible to manipulate it using pointer-like notation, but since what it's an array of is other pointers, it can start to get a bit confusing. Though many programmers do write things like

	printf("%s\n", *lineptr++);

and though this is correct code, and though one should probably understand it to have a 100% complete understanding of C, I've decided that code like that is just a bit too hard to follow, and I'd always write (perhaps more pedestrian and mundane) things like

	printf("%s\n", lineptr[i]);
or
	printf("%s\n", lineptr[i++]);

page 110

Since I didn't ask you to follow the qsort example in section 4.10 in complete detail, I won't ask you to work through this one completely, either. But if you compare the code here to the code on pages 87-88, you will see that the only significant differences are that the variables and arrays containing the things being sorted have been changed from int to char * (pointer-to-char), and the comparison

	if (v[i] < v[left])

has been changed to

	if (strcmp(v[i], v[left]) < 0)

Read sequentially: prev next up top