File I/O Using C Part II




		Character I/O from files
		Character-at-a-time input and output is simple and straightforward. The getchar function reads the next character from the standard input; getc(fp) reads the next character from the stream fp. Both return the next character or, if the next character can't be read, the non-character constant EOF, which is defined in <stdio.h>. (Usually the reason that the next character can't be read is that the input stream has reached end-of-file, but it's also possible that there's been some I/O error.) Since the value EOF is distinct from all character values, it's important that the return value from getc and getchar be assigned to a variable of type int, not char. Don't declare the variable to hold getc's or getchar's return value as a char; don't try to read characters directly into a character array with code like

		while(i < max && (a[i] = getc(fp)) != EOF) /* WRONG, for char a[] */

		The code may seem to work at first, but some day it will get confused when it reads a real character with a value which seems to equal that which results when the non-char value EOF is crammed into a char.

		One more reminder about getchar: although it returns and therefore seems to read one character at a time, it typically delivers characters from internal buffers which may hold more characters which will be delivered later. For example, most command-line-based operating systems let you type an entire line of input, and wait for you to type the RETURN or ENTER key before making any of those characters available to a program (even if the program thought it was doing character-at-a-time input with calls to getchar). There are, of course, ways to read characters immediately (without waiting for the RETURN key), but they differ from operating system to operating system.

		Writing single characters is just as easy as reading: putchar(c) writes the character c to standard output; putc(c, fp) writes the character c to the stream fp. (The character c must be a real character. If you want to ``send'' an end-of-file condition to a stream, that is, cause the program reading the stream to ``get'' end-of-file, you do that by closing the stream, not by trying to write EOF to it.)

		Occasionally, when reading characters, you sometimes find that you've read a bit too far. For example, if one part of your code is supposed to read a number--a string of digits--from a file, leaving the characters after the digits on the input stream for some other part of the program to read, the digit-reading part of the program won't know that it has read all the digits until it has read a non-digit, at which point it's too late. (The situation recalls Dave Barry's recipe for ``food heated up'': ``Put the food in a pot on the stove on medium heat until just before the kitchen fills with black smoke.'') When reading characters with the standard I/O library, at least, we have an escape: the ungetc function ``un-reads'' one character, pushing it back on the input stream for a later call to getc (or some other input function) to read. The prototype for ungetc is

		int ungetc(int c, FILE *fp)

		where c is the character which is to be pushed back onto the stream fp. For example, here is a code scrap that reads digits from a stream (and converts them to the corresponding integer), stopping at the first non-digit character and leaving it on the input stream:

		#include <ctype.h>

		int c, n = 0;
		while((c = getchar()) != EOF && isdigit(c))
		n = 10 * n + (c - '0');
		if(c != EOF)
		ungetc(c, stdin);

		It's only guaranteed that you can push one character back, but that's usually all you need.

		Closing a file
		When done with a file, it must be closed using the function fclose().

		To finish our example, we'd want to close our input and output files:

		fclose(ifp);
		fclose(ofp);

		Closing a file is very important, especially with output files. The reason is that output is often buffered. This means that when you tell C to write something out, e.g.,

		fprintf(ofp, "Whatever!\n");

		it doesn't necessary get written to disk right away, but may end up in a buffer in memory.

		When the buffer fills up (or when the file is closed), the data is finally written to disk. So, if you forget to close an output file then whatever is still in the buffer may not be written out.
		Note: There are other kinds of buffering than the one we describe here.

		stdin,stdout and stderr
		These three file pointers are automatically defined when a program executes and provide access to the keyboard and screen.

		stdin
		By default stdin accesses the keyboard. Functions that read stdin include...

		gets getchar

		The following functions can access stdin

		fgets fgetc fscanf

		stdout
		stdout sends data to the screen. Functions that write to stdout include....

		printf puts putchar

		stderr
		stderr also writes to the screen. If you are using a Unix based system the data sent to stdout and stderr can be seperated and sent to different places. Functions that could write to stderr include...

		fprintf fputs




		Line I/O with files
		The function

		char gets(char line)

		reads the next line of text (i.e. up to the next \n) from the standard input and places the characters (except for the \n) in the character array pointed to by line. It returns a pointer to the line (that is, it returns the same pointer value you gave it), unless it reaches end-of-file, in which case it returns a null pointer. It is assumed that line points to enough memory to hold all of the characters read, plus a terminating \0 (so that the line will be usable as a string). Since there's usually no way for anyone to guarantee that the array is big enough, and no way for gets to check it, gets is actually a useless function, and no serious program should call it.

		The function

		char fgets(char line, int max, FILE *fp)

		is somewhat more useful. It reads the next line of input from the stream fp and places the characters, including the \n, in the character array pointed to by line. The second argument, max, gives the maximum number of characters to be written to the array, and is usually the size of the array. Like gets, fgets returns a pointer to the line it reads, or a null pointer if it reaches end-of-file. Unlike gets, fgets does include the \n in the string it copies to the array. Therefore, the number of characters in the line, plus the \n, plus the \0, will always be less than or equal to max. (If fgets reads max-1 characters without finding a \n, it stops reading there, copies a \0 to the last element of the array, and leaves the rest of the line to be read next time.) Since fgets does let you guarantee that the line being read won't go off the end of the array, you should always use fgets instead of gets. (If you want to read a line from standard input, you can just pass the constant stdin as the third argument.) If you'd rather not have the \n retained in the input line, you can either remove it right after calling fgets (perhaps by calling strchr and overwriting the \n with a \0), or maybe call the getline or fgetline function we've been using instead. (See chapters 6 and 12; these functions are also handy in that they return the length of the line read. They differ from fgets in their treatment of overlong lines, though.)

		The function

		int puts(char *line)

		writes the string pointed to by line to the standard output, and writes a \n to terminate it. It returns a nonnegative value (we don't really care what the value is) unless there's some kind of a write error, in which case it returns EOF.

		Finally, the function

		int fputs(char line, FILE fp)

		writes the string pointed to by line to the stream fp. Like puts, fputs returns a nonnegative value or EOF on error. Unlike puts, fputs does not automatically append a \n.

		Formatted I/O with files
		There are a number of related functions used for formatted I/O, each one determining the format of the I/O from a format string. For output, the format string consists of plain text, which is output unchanged, and embedded format specifications which call for some special processing of one of the remaining arguments to the function. On input, the plain text must match what is seen in the input stream; the format specifications again specify what the meaning of remaining arguments is.

		Each format specification is introduced by a % character, followed by the rest of the specification.

		Output: the printf family

		For those functions performing output, the format specification takes the following form, with optional parts enclosed in brackets:

		%<flags><field width><precision><length>conversion

		The meaning of flags, field width, precision, length, and conversion are given below, although tersely. For more detail, it is worth looking at what the Standard says.

		flags

		Zero or more of the following:

		- :Left justify the conversion within its field. + :A signed conversion will always start with a plus or minus sign. space: If the first character of a signed conversion is not a sign, insert a space. Overridden by + if present. # :Forces an alternative form of output. The first digit of an octal conversion will always be a 0; inserts 0X in front of a non-zero hexadecimal conversion; forces a decimal point in all floating point conversions even if one is not necessary; does not remove trailing zeros from g and G conversions. 0 :Pad d, i, o, u, x, X, e, E, f, F and G conversions on the left with zeros up to the field width. Overidden by the - flag. If a precision is specified for the d, i, o, u, x or X conversions, the flag is ignored. The behaviour is undefined for other conversions.

		Random access using fseek()
		Normally, files and streams (that is, anything accessed via a FILE *) are read and written sequentially. However, it's also possible to jump to a certain position in a file.

		To jump to a position, it's generally necessary to have ``been there'' once already. First, you use the function ftell to find out what your position in the file is; then, later, you can use the function fseek to get back to a saved position.

		File positions are stored as long ints. To record a position, you would use code like

		long int pos;
		pos = ftell(fp);

		Later, you could ``seek'' back to that position with

		fseek(fp, pos, SEEK_SET);

		The third argument to fseek is a code telling it (in this case) to set the position with respect to the beginning of the file; this is the mode of operation you need when you're seeking to a position returned by ftell.

		As an example, suppose we were writing a file, and one of the lines in it contained the words ``This file is n lines long'', where n was supposed to be replaced by the actual number of lines in the file. At the time when we wrote that line, we might not know how many lines we'd eventually write. We could resolve the difficulty by writing a placeholder line, remembering where it was, and then going back and filling in the right number later. The first part of the code might look like this:

		long int nlinespos = ftell(fp);
		fprintf(fp, "This file is %4d lines long\n", 0);

		Later, when we'd written the last line to the file, we could seek back and rewrite the ``number-of-lines'' line like this:

		fseek(fp, nlinespos, SEEK_SET);
		fprintf(fp, "This file is %4d lines long\n", nlines);

		There's no way to insert or delete characters in a file after the fact, so we have to make sure that if we overwrite part of a file in this way, the overwritten text is exactly the same length as the previous text. That's why we used %4d, so that the number would always be printed in a field 4 characters wide. (However, since the field width in a printf format specifier is a minimum width, with this choice of width, the code would fail if a file ever had more than 9999 lines in it.)

		Three other file-positioning functions are rewind, which rewinds a file to its beginning, and fgetpos and fsetpos, which are like ftell and fseek except that they record positions in a special type, fpos_t, which may be able to record positions in huge files for which even a long int might not be sufficient.

		If you're ever using one of the ``read/write'' modes ("r+" or "w+"), you must use a call to a file-positioning function (fseek, rewind, or fsetpos) before switching from reading to writing or vice versa. (You can also call fflush while writing and then switch to reading, or reach end-of-file while reading and then switch back to writing.)

		In binary ("b") mode, the file positions returned by ftell and used by fseek are byte offsets, such that it's possible to compute an fseek target without having to have it returned by an earlier call to ftell. On many systems (including Unix, the Macintosh, and to some extent MS-DOS), file positioning works this way in text mode as well. Code that relies on this isn't as portable, though, so it's not a good idea to treat ftell/fseek positions in text files as byte offsets unless you really have to.

		File remove, rename

		You can delete a file by calling

		int remove(char *filename)

		You can rename a file by calling

		int rename(char oldname, char newname)

		Both of these functions return zero if they succeed and a nonzero value if they fail.

		There are no standard C functions for dealing with directories (e.g. listing or creating them). On many systems, you will find functions mkdir for making directories and rmdir for removing them, and a suite of functions opendir, readdir, and closedir for listing them. Since these functions aren't standard, however, we won't talk about them here. (They exist on most Unix systems, but they're not standard under MS-DOS or Macintosh compilers, although you can find implementations on the net.)

Do you have any Comment? mail me at:deepak@asic-world.com