Characters, Strings, and Numbers

The earliest computers were number crunchers only, but almost all more recent computers have the ability to manipulate alphanumeric data as well. The computer, and our programming languages, tend to maintain a strict distinction between numbers on the one hand and alphanumeric data on the other, so we have to maintain that distinction in our own minds as well.

One fundamental component of a computer's handling of alphanumeric data is its character set. A character set is, not surprisingly, the set of all the characters that the computer can process and display. (Each character generally has a key on the keyboard to enter it and a bitmap on the screen which displays it.) A character set consists of letters, numbers, punctuation, etc., but the point of this discussion is not so much what the characters are but that we have to be careful to distinguish between characters, strings, and numbers.

A character is, well, a single character. If we have a variable which contains a character value, it might contain the letter `A', or the digit `2', or the symbol `&'.

A string is a set of zero or more characters. For example, the string ``and'' consists of the characters `a', `n', and `d'. The string ``K2!'' consists of the characters `K', `2', and `!'. The string ``.'' consists of the single character `.', and the empty string ``'' consists of no characters at all. Not to belabor the point, but the string ``123'' consists of the characters `1', `2', and `3', and the string ``4'' consists of the single character `4'.

The last two examples illustrate some important and perhaps surprising or annoying distinctions. The character `4' and the string ``4'' are conceptually different, and neither of them is quite the same as the number 4. The string ``123'' consists of three characters, and it looks like the number 123 to us, but as far as the computer is concerned it is just a string. The number 123 is, when used for ordinary numeric purposes, not represented internally as a string of three characters (instead, it is typically represented as a 16- or 32-bit integer). When we have a string which contains a numeric value which we wish to manipulate as a number, we must typically ask for the string to be explicitly converted to that number somehow. Similarly, we may have reason to convert a number to a string of digits making up its decimal representation.

We may also find ourselves needing to convert back and forth between characters and the numeric codes which are assigned to each character in a character set. (For example, in the ASCII character set,the character `A' is code 65, the character `.' is code 46, and the character `4' is, perhaps surprisingly, code 52.)


Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995, 1996 // mail feedback