section 5.4: Address Arithmetic

This section is going to get pretty hairy. Some of it talks about things we've already seen (adding integers to pointers); some of it talks about things we need to learn (comparing and subtracting pointers); and some of it talks about a rather sophisticated example (a storage allocator). Don't worry if you can't follow all the details of the storage allocator, but do read along so that you can pick up the other new points. (In other words, make sure you read from ``Zero is the sole exception'' in the middle of page 102 to ``that is, the string length'' on page 103, and also the last paragraph on page 103.)

What is a storage allocator for? So far, we've used pointers to point to existing variables and arrays, which the compiler allocated for us. But eventually, we may want to allocate data structures (arrays, and others we haven't seen yet) of a size which we don't know at compile time. Earlier, we spoke briefly about a hypothetical inventory-management system, which recorded information about each part stored in a warehouse. How many different parts could there be? If we used fixed-size arrays, there would be a fixed upper limit on the number of parts we could enter into the system, and we'd be annoyed if that limit were reached. A better solution is not to allocate a fixed array at compile time, but rather to use a run-time storage allocator to allocate memory for the data structures used to describe each part. That way, the number of parts which the system can hold is limited only by available memory, not on any static limit built into the program. Using a storage allocator to allocate memory at run time in this way is called dynamic allocation.

However, dynamic memory allocation is where C programming can really get tricky, because you the programmer are responsible for most aspects of it, and there are plenty of things you can do wrong (e.g. not allocate quite enough memory, accidentally keep using it after you deallocate it, have random invalid pointers pointing everywhere, etc.). Therefore, we won't be talking about dynamic allocation for a while, which is why you can skim over the storage allocator in this section for now.

page 102

The first new piece of information in this section (which you'll need to remember even if you're not following the details of the storage allocator example) is the introduction of the ``null pointer.''

So far, all of our pointers have pointed somewhere, and we've cautioned about pointers which don't. To help us distinguish between pointers which point somewhere and pointers which don't, there is a single, special pointer value we can use, which is guaranteed not to point anywhere. When a pointer doesn't point anywhere, we can set it to this value, to make explicit the fact that it doesn't point anywhere.

This special pointer value is called the null pointer. The way to set a pointer to this value is to use a constant 0:

	int *ip = 0;

The 0 is just a shorthand; it does not necessarily mean machine address 0. To make it clear that we're talking about the null pointer and not the integer 0, we often use a macro definition like

	#define NULL 0

so that we can say things like

	int *ip = NULL;

(If you've used Pascal or LISP, the nil pointer in those languages is analogous.)

In fact, the above #definition of NULL has been placed in the standard header file <stdio.h> for us (and in several other standard header files as well), so we don't even need to #define it. I agree completely with the authors that using NULL instead of 0 makes it more clear that we're talking about a null pointer, so I'll always be using NULL, too.

Just as we can set a pointer to NULL, we can also test a pointer to see if it's NULL. The code

	if(p != NULL)
		*p = 0;
	else	printf("p doesn't point anywhere\n");

tests p to see if it's non-NULL. If it's not NULL, it assumes that it points somewhere valid, and writes a 0 there. Otherwise (i.e. if p is the null pointer) the code complains.

Though we can use null pointers as markers to remind ourselves of which of our pointers don't point anywhere, it's up to us to do so. It is not guaranteed that all uninitialized pointer variables (which obviously don't point anywhere) are initialized to NULL, so if we want to use the null pointer convention to remind ourselves, we'd best explicitly initialize all unused pointers to NULL. Furthermore, there is no general mechanism that automatically checks whether a pointer is non-null before we use it. If we think that a pointer might not point anywhere, and if we're using the convention that pointers that don't point anywhere are set to NULL, it's up to us to compare the pointer to NULL to decide whether it's safe to use it.

The next new piece of information in this section (which we've already alluded to) is pointer comparison. You can compare two pointers for equality or inequality (== or !=): they're equal if they point to the same place or are both null pointers; they're unequal if they point to different places, or if one points somewhere and one is a null pointer. If two pointers point into the same array, the relational comparisons <, <=, >, and >= can also be used.

page 103

The sentences

...n is scaled according to the size of the objects p points to, which is determined by the declaration of p. If an int is four bytes, for example, the int will be scaled by four.

say something we've seen already, but may only confuse the issue. We've said informally that in the code

	int a[10];
	int *pa = &a[0];
	*(pa+1) = 1;

pa contains the ``address'' of the int object a[0], but we've discouraged thinking about this address as an actual machine memory address. We've said that the expression pa+1 moves to the next int in the array (in this case, a[1]). Thinking at this abstract level, we don't even need to worry about any ``scaling by the size of the objects pointed to.''

If we do look at a lower, machine level of addressing, we may learn that an int occupies some number of bytes (usually two or four), such that when we add 1 to a pointer-to-int, the machine address is actually increased by 2 or 4. If you like to consider the situation from this angle, you're welcome to, but if you don't, you certainly don't have to. If you do start thinking about machine addresses and sizes, make extra sure that you remember that C does do the necessary scaling for you. Don't write something like

	int a[10];
	int *pa = &a[0];
	*(pa+sizeof(int)) = 1;

where sizeof(int) is the size of an int in bytes, and expect it to access a[1].

Since adding an int to a pointer gives us another pointer:

	int a[10];
	int *pa1 = &a[0];
	int *pa2 = pa1 + 5;

we might wonder if we can rearrange the expression

	pa2 = pa1 + 5

to get

	pa2 - pa1 ≟ 5

(where this is no longer a C assignment, we're just wondering if we can subtract pa1 from pa2, and what the result might be). The answer is yes: just as you can compare two pointers which point into the same array, you can subtract them, and the result is, naturally enough, the distance between them, in cells or elements.

(In the large parenthetical statement in the middle of the page, don't worry too much about ptrdiff_t, size_t, and sizeof.)

Read sequentially: prev next up top