Non-volatile storage. Completely external to a Process.

  • File changes persist beyond lifetime of a process, and lifetime of a computer session (i.e. survives your PC being powered off).
  • Files can be accessed by multiple processes
  • Stored on completely different hardware than memory

File descriptors (FDs)

A process-specific unique ID that can be used to refer to a file when invoking system calls. It is of type int. Files in UNIX-based systems may not just refer to literal files, just file-like:

  • Terminals
  • Network connections
  • Pipes

Because these are all considered files, we can use the standard read() and write() API to interact with them.

File descriptor table

Each process has its own file descriptor table managed by the OS. The table maintains info about files that the process has references to.

The FD is literally an index number in a process’ FD table. Note that tables are specific to each process, so the same number can refer to different files across different processes.

System wide file table

The FD is an index, but into what? It turns out it indexes to a pointer to a system wide file table. The kernel manages this, and each pointer points to a struct that contains info like:

  • cursor: keeps track of where in the file we’re reading or writing to.
  • ref_count: the number of references to that entry in the file table.
  • file_name: the “name” of the corresponding file
  • Some other fields we don’t care about right now

When a new process tries to open a file and it already exists in the system file table, the OS simply increments ref_count and returns a pointer to that struct for the process’ FD table.

Redirecting file descriptors

int dup2(int oldfd, int newfd);

The FD represented by newfd is adjusted so that it now refers to the same open file as oldfd. Simultaneously, the file once referred to by newfd is silently closed.

stdout, stdin, stderr

So it turns out that these are all considered files! They’re opened by default when a program starts. They’re defined in unistd.h as constants.

  • 0: stdin
  • 1: stdout
  • 2: stderr

When we call printf(), this is analogous to calling write() to stdout.

Closing them??

Because the FD number for these are constant and reserved across all processes, if we were to, for example, close STDOUT_FILENO, and open a file immediately afterwards, anything we print will instead be redirected to that file.

Opening and closing files

int open(const char* pathname, int flags, /* mode_t perm */);

Returns a valid file descriptor. The last parameter, perm, is optional and only important if we’re creating new files, in which case we need to set the permissions. There are many flags.

int close(int fd);

The file descriptor is closed, the number will no longer refer to the specific file, and it allows for the number to be reused for other files.