16.6: Formatted Input (scanf)

Just as putchar has its getchar and fputs has its fgets, there's an input analog to printf, namely scanf. scanf reads characters from standard input, under control of a format string, perhaps converting some components of the string and storing them into variables. For example, just as you could use the call

	printf("(%d, %d)", x, y);
to print two integer values and some surrounding punctuation, you could use the call
	scanf("(%d, %d)", &x, &y);
to attempt to extract two integer values from some input containing similar punctuation.

scanf interprets a format string, much like printf, with the first difference being that scanf attempts to read characters and match them against the format string, rather than printing under control of the format string. For each ordinary character in the format string, scanf expects to see that character on the input; if not, it fails. For each format specifier in the input string, scanf attempts to match and convert a string appropriate to the format specifier, storing the converted result into a variable pointed to by the corresponding argument. If it can't find any characters matching the format specifier, it fails.

Since scanf ``returns'' many values (one for each format specifier in the format string), it must do so using pointers which the caller passes. For each value to be converted, the caller passes a pointer to the variable (or other location) where scanf should write the converted value. All arguments passed to scanf must be pointers.

The format strings used by scanf are similar to those used by printf, but there are several differences.

The optional width gives the maximum number of characters to read while performing the conversion requested by a particular format specifier. (If there are many adjacent characters which could satisfy a request--many digits for one of the numeric conversions, or many characters for %s conversion--the width keeps scanf from gobbling all of them up at once.)

There is no equivalent to the precision modifier.

If the * flag appears, it indicates that the converted value should be discarded, not written to a location pointed to by one of the pointers in the argument list. (In other words, there is no corresponding argument.) Since * is usurped for this function, there is no way to use a variable field width from the argument list with scanf. There are no other flags.

The modifier characters are more significant. An h indicates that the corresponding integer pointer argument (for %d, %u, %o, or %x) is a short int * or unsigned short int *. An l indicates that the corresponding integer pointer argument (for %d, %u, %o, or %x) is a long int * or unsigned long int *, or that the floating-point pointer argument (for %e, %f, or %g) is a double * rather than a float *. (Similarly, an L indicates a long double *.)

The %c format will read more than one character if an explicit width greater than 1 is specified. The corresponding argument must be a pointer to enough space to hold all the characters read.

The %e, %f, and %g formats all read strings in either scientific notation or conventional decimal fraction m.n notation. (In other words, the three formats act just the same.) However, they assume a float * argument unless the l modifier appears, in which case they expect a double *. (This is in contrast to printf, which accepts either float or double arguments for %e, %f, and %g, due to the default argument promotions.)

The %i format will read a number in decimal, octal, or hexadecimal, taking a leading 0 to indicate octal and a leading 0x (or 0X) to indicate hexadecimal, i.e. the same rules as used by C constants.

The %n format causes the number of characters read so far (by this call to scanf) to be stored in the integer pointed to by the corresponding argument.

The %s format will read a string, up to the next whitespace character, and copy the string, terminated by a \0, to the corresponding argument, which must be a char *. The caller must ensure (perhaps by using an explicit width) that there is enough space to hold the received characters.

scanf has a special format specifier %[...], which matches any string composed of characters specified in the []. For example, %[abc] would match any string composed of a's, b's, and c's. The corresponding argument is a char *; the matched string is written to the location pointed to, followed by a \0. The caller must ensure (perhaps by using an explicit width) that there is enough space to hold the received characters. A second form, %[^...], matches a string of characters not found in the set. For example, scanf("(%[^)])", s) reads, into the string s, a string of characters (possibly including whitespace) from an input in which the string appears enclosed in parentheses. It may also be possible to specify ranges of characters (e.g. %[a-z], %[0-9], etc.), but these are not as portable.

With the exception of %c, %n, and %[, all of the conversion specifiers skip any leading whitespace (spaces, tabs, or newlines) which might precede the value or string converted. Also, any whitespace character in the format string matches any number of whitespace characters in the input. Therefore, the format "%d %d" would match the input "12 34" or "12  34" or "12\t34". However, the format "%d%d" would match all of these inputs as well, since the second %d first scans past any whitespace preceding the 34.

scanf returns the number of items it successfully converts and stores. It will return a number less than expected (less than the number of format specifiers not containing *, or less than the number of corresponding pointer arguments) if the conversion fails at any point, and it will leave any unrecognized characters (i.e. the ones that caused the last match to fail) waiting in the input for next time. scanf returns EOF if it encounters end-of-file before converting anything.

If you want to read characters from an arbitrary stream, you can use fscanf, which takes an initial FILE * argument.

You can scan and convert characters from a string (rather than from a stream) using sscanf. For example,

	int x, y;
	sscanf("12 34", "%d %d", &x, &y);
would place 12 in x and 34 in y.

scanf and fscanf are seductively useful, but they have a number of drawbacks in practice. They seem to make it very easy to, say, prompt the user for a number:

	int x;
	printf("Type a number:\n");
	scanf("%d", &x);
But what happens if the user fumbles, and types something other than a number? Even if the code checks scanf's return value, and prompts the user again if scanf returns 0, the non-numeric input remains on the input, and will be encountered by the next call to scanf unless some other steps are taken. (That is, scanf will rediscover the user's old, bad input before it gets to any new input.) It's also easy to write things like
	scanf("%d\n", &x);
but this code does not work as intended; the \n in the format string is a whitespace character, which asks scanf to discard one or more whitespace characters, so it will keep reading characters as long as they are whitespace characters, that is, it will read characters until it finds something that is not a whitespace character. It won't read that eventual non-whitespace character once it finds it, but in the process of looking for it it will seem to jam your program, since the call to scanf won't return right after the user types a number.

Therefore, it's much better to read interactive user input a line at a time, and then use functions like atoi (or perhaps sscanf) to interpret the line that the user typed.


Read sequentially: prev next up top

This page by Steve Summit // Copyright 1996-1999 // mail feedback