Most File systems are hierarchical, which means that it’s made up of Directories that can contain not only files, but other directories. There is a single root directory that contains everything.

When thinking about a file system, this includes the algorithms and data structures that are required for storing, organizing, and retrieving data. Which ones are chosen and how they’re implemented is what differentiates file systems.

An operating system implements such a file system and needs to support at least one (Windows only uses NTFS1), but can also support multiple (Linux can use ext2/ext3/ext4, Btrfs, XFS, etc.)

Block

Consider a SSD. To the OS, that storage can be treated as one big array. Literally:

int the_filesystem[REALLY_REALLY_BIG];

Here, we can technically access each individual int. But OSes (filesystems) don’t do this for various reasons. Instead, the base unit size of elements in file systems are called blocks.

  • Depending on the implementation and hardware, block sizes can be anything. Common sizes include 512 to 4096 bytes, or even larger.
    • This is a fixed number of contiguous bytes.
    • Whenever a read or write is performed, this is done on the entire block, even if we’re only changing a single byte.
  • The smallest space a file takes up on disk is 1 block.
  • Storage mediums like SSDs can be thought of as a giant collection of blocks.
  • The file system (aka OS) has to organize these blocks.

Abstracting the filesystem for the user

As a user, a lot of APIs treat files as a continuous stream of bytes. However, there are two abstractions at play:

  1. A file’s data is split up and grouped into multiple physical blocks
  2. These physical blocks are usually not contiguous on the physical disk. However, the user will see “logical” blocks that are sequential.

Both of these truths are hidden away from the user. We are free to fseek() as we please without realizing that we may be jumping across multiple physical locations on the disk. The OS is maintaining this abstraction to the user.

Sort of similar to virtual versus physical address translation.

Bitmap block and root directory block

We can store information about blocks, files, and directories within the filesystem itself, organized into special blocks with hardcoded locations.

  • Bitmap block: can be located at block 0. It keeps track of which blocks are free and which ones are not. If there are blocks, then we need bits.
  • Root directory block: can be located at block 1. Since the root holds everything, we can always start here and traverse the hierarchy to get to our file.

This allows us to answer two important questions at all times:

  • Where is a file located? Begin at the root directory (“/”) and go from there.
  • Which blocks are free? Maintain a bitmap of block numbers to indicate whether they’re free or not.

Note that in practice, the bitmap (which is replaced by the FAT in a FAT file system) and root directory span more than one block, because they’re really big.

Block caching

We would like to minimize disk I/O because it’s pretty slow relative to memory. We can store a data structure in memory that keeps track of the most and least recently used blocks.

  • Use a doubly linked list to store blocks s.t. the most recently used block is at the head, and least recently used block is at the tail.
    • Enables removal of LRU block
  • Update next and previous pointers as necessary when we access blocks
  • Also use a hashmap to enable lookups and moves to the front or back of the list (this usually takes but is now faster due to faster search)

Files that take up more than one block

What if our file is too big for one single block size? Spoiler: this is very, very very common e.g. a song would probably meet this criteria.

The following two solutions are naive and by themselves don’t offer the best performance. But they’re useful stepping stones to building a better system.

Contiguous allocation

One easy solution is to expand a file into the next block on disk.

To find a particular block for a file, we would only need 1 block read!2 We access the root directory block and traverse it to find the location of the file’s first block, offset by whatever is necessary (which we’re allowed to do because everything is contiguous), and that gives us the correct block number to “index” into.

Contiguous allocation makes the big assumption that the next block is free; if not, then we would have to shift everything to the right to make room. This takes time and increases fragmentation (random empty blocks that we can’t use unless we defrag), so it’s not very efficient.

Implicit linked list allocation

We could have each block reserve some bits at the end for a pointer to the next block in the file (or a special value to indicate it’s EOF). This is not memory pointer but rather a disk pointer, e.g. a block location on the disk.

  • Directory entries still hold the location of the first block.
  • To grow a file into another block, we scan the bitmap block for free blocks, allocate it, and set up the right pointers.

While the issue of fragmentation is resolved, seeking blocks is much, much slower, because we’re accessing so many places on disk (compared to the 1 block read above).

Also see

Footnotes

  1. Technically it can also use ReFS but it’s kinda niche (mostly useful for servers) and completely optional (I don’t think NTFS is optional).

  2. This assumes we only need 1 block to store the bitmap and root directory blocks, which is usually not true. This also assumes the root directory doesn’t contain subdirectories, because they would be contained in their own block, requiring another read.