We store only the data of the Processes that are currently running in physical memory. Furthermore, we don’t need all of their data to be in physical memory, just the parts that are currently being used.

Where can the other data go? The Filesystem on the disk. This is the swap file that haunts every new Linux user that’s trying to figure out installations.

Addressable memory

The total amount of memory that can theoretically be accessed. The answer depends on two factors:

  • Address space: the number of addresses available. On 64-bit systems, we have available addresses.
  • Addressability: how many bytes each address corresponds to. All modern OSes are byte addressable, meaning every byte of memory is assigned a unique address.

Page

In virtual memory, we split up memory into chunks called pages. They’re of a fixed size, with a very common size being 4 KiB, or 4096 bytes.

Parallels can be drawn to blocks on the filesystem. Virtual memory abstracts things on the level of pages.

Both the process’ virtual address space as well as our real physical memory on RAM are split into these pages. Given a page from a process, it can exist in 1 of 3 states:

  • Currently in use. The page is mapped from the virtual address space to a real location currently stored in RAM. Pages that are currently in physical memory are also referred to as page frames.
  • Previously in use. The page is mapped from the virtual address space to a location currently stored on disk.
  • Never used. Unused pages do not have any mappings (yet).

Page offset

While it’s cool and all that a page is usually 4096 bytes, that doesn’t help us access individual addresses. Since most machines are byte addressable, this means two things:

  • There are 4096 distinct, unique addresses that can correspond to the same page.
  • We need to dedicate a minimum of 12 bits to calculating the location (offset) of an address within a page. This is because .

Page table

A page table allows us to convert virtual addresses to physical addresses. There is 1 page table per process to enforce memory isolation from other processes. An example:

Virtual page numberValid?Physical page (frame) numberReferenceDirtyPermissions
00NULLn/an/an/a
110111Read/write
211010Read/execute
30On diskn/an/an/a

Some fields that a page table entry has:

  • Virtual page and physical frame numbers (if it exists)
  • Valid bit: if the entry is currently loaded in physical memory. If not, a page fault occurs.
  • Reference: for eviction purposes
  • Dirty bit: tracks whether we’ve written to this page. If we’re about to evict this page from RAM, then we have to first flush the contents to disk so that we don’t lose data.
  • Permissions: at least 3 bits to determine whether the page can be used for reading, writing, or executing

The size of a single-level page table is static. How many entries (aka page entries) must exist? On a 64-bit system, 12 bits are dedicated to the page offset in the Virtual address, so there are possible entries.

Optimizations

By default the above “big array” view of page tables take up a lot of space because we need to allocate memory for possible entries. Here are some optimizations that real page tables perform.

  • Remove the virtual page number field because it’s implicitly the index into the page table
  • Entries are simply bit sequences (like a Data type) where specific bits encode different field information.

Multi-level page tables: Linux implementation

The biggest issue is that we still have entries to allocate. What if we instead store page table entries in a tree-like structure with multiple levels?

First we split the 52 bits further into 4 groups of 9 bits.

Each group is an index into a smaller table that has entries. Much better.

Each intermediary table entry points to the start of the next level’s table, which we index into using the next group of bits. Keep doing this until we index into the final page table entry… tables.

Benefits

We can take advantage of the Principle of locality. Pages near each other in memory will belong to the same nodes in the tree.

Most of the virtual pages theoretically available to a process go unused. Multi-level page tables take advantage of this because most pointers in the tables are NULL. Also, we can lazily allocate page table entries for pages as they’re needed.

If each page table entry itself is max bytes, and each level’s table has entries, then each level’s table is bytes.

  • This means each level table fits exactly into the size of a page!
  • Extremely convenient since the page table itself lies in memory, but is now aligned against page boundaries.

Page fault exception

In general, exceptions are a transfer of control to the OS kernel in response to some event that just occurred.

Page fault exceptions occur when we attempt to access a memory location that is not currently in physical memory.

Thrashing

It is extremely important that we minimize page faults! Constantly having to interrupt process execution just to fetch some memory location is very expensive and costly.

When the physical memory of a computer is overcommitted,1 this can cause thrashing to occur.

Page replacement and eviction

When physical memory is full and we need to access a memory location in a page not currently in RAM, we have to evict another page from physical memory. The goal is to minimize the number of times we have to do this.

Only pages with content are moved to disk. Why waste time writing an empty page to disk? This can occur if a process was allocated the page but never wrote to it.

The optimal replacement algorithm would always evict the page that is “furthest” away from being used again in the future. Unfortunately we can’t predict the future, but we can make best guesses.

First in first out (FIFO)

Aka a queue. If we need to evict a page, let’s remove the page that has been in memory the longest.

Interestingly, FIFO strategies may suffer from Bélády’s anomaly, which is when increasing the number of page frames in the queue can increase the number of page faults. This is counterintuitive because you’d think having more space to store pages would decrease faults.

However, increasing the number of page frames can ruin the replacement order, which messes up the timing of who gets evicted.

Least recently used (LRU)

We assume that if a page is used recently, it is likely to be used again in the future (referencing temporal locality). Therefore, we always evict the page that has had the longest time since it was last accessed.

The order of values when using a LRU data structure is always a subsequence of page accesses.

One potential issue with that happens with LRU is increased Thrashing. If we access the same pages in a cyclical loop that is larger than the number of spaces in the LRU, we will keep evicting a page that will soon be needed.

Implementation: reference bit and clock interval aging

We want a fast, space-saving implementation. Timestamping each memory access and sorting them in a list would take too much time. Priority queues would be overkill.

Instead, we can store a reference bit in the page table entry itself to somehow indicate it was accessed recently. This can automatically be done by hardware when accessing memory, and flipping bits is much faster.

  • Maintain a 8-bit “counter” for each page
  • Between each clock interval, if the page was accessed, we flip the reference bit to 1.
  • On clock tick, we shift the counter to the right by 1, write the current reference bit value into the MSB of the counter, and then reset the reference bit back to 0.

We can then read the counter as an unsigned integer (e.g. uint8_t). A larger value means that the counter was accessed more recently.

Second chance algorithm

Extremely similar to FIFO, but we utilize the reference bit from the page table entry.

In particular, when we prepare to evict the first page (i.e. longest in memory), we check if the reference bit is 1, indicating that it was used in the last time interval.

  • If so, we move it to the end of the queue and reset the reference bit to 0.
  • Repeat the process until we fine a page that doesn’t have its reference bit set.

Optimization: implement the queue using a circular/ring buffer so that the cost of moving something to the “end” of the queue is minimal.

Linux two-list clock algorithm

The idea is there is a inactive list and a active list. When a page needs to be evicted, we choose a page from the inactive list first. These two lists are further broken down into two groups, one where the reference bit is set to 0, and one where it’s set to 1.

  • If a page in the inactive list has a reference bit of 1 and is referenced again, it’s promoted to the active list and has its reference bit reset to 0.
  • A decay mechanism causes a page to derank down the groups if after a certain period it hasn’t been accessed.

Virtual address

The only addresses that our programs/processes can see. When we print a pointer address to standard output, the value we see is actually a virtual address. It’s also these addresses which get manipulated if we perform pointer arithmetic.

A virtual address is composed of two parts relevant for translation:

  • Bits required to represent the virtual page number length (how many virtual pages there are in the process, determined by the size of Addressable memory)
  • Bits required to represent the page offset length (the number of bytes in a page)

A virtual address in one process will refer to a different physical location than another process with the same virtual address. This is because each page table does its own mapping. It’s also good for process isolation!

Translation to physical address

The virtual address is manipulated to derive the physical address. These are the general steps the MMU has to take to find the physical address location given a virtual memory address:

  1. Derive the virtual page number from the virtual address we’re accessing
  2. Check if the virtual page exists in the TLB. If it does, fetch the mapped physical page frame and skip the page table lookup (step 3 below) entirely.
  3. Lookup the virtual page number in the process page table.
    • If the page is currently in RAM (i.e. valid), we return the physical page number.
    • If the page has previously been accessed, we fetch it from disk and load it into RAM, returning the newly assigned physical page number.
    • We do the same if the page has never been assigned before.
    • During this process, we may need to evict another page frame if physical memory is full.
  4. If the MMU originally missed the TLB, then we insert the found physical page number into the TLB now.2
  5. Construct the physical address using the physical page number and the page offset.3

Also see

Footnotes

  1. Can happen in two ways: too many processes are running, or a few processes require too much memory.

  2. Actually, the MMU always goes through the TLB. This means that even if we missed the TLB originally, after we insert the newly found physical frame translation, we then fetch it again from the TLB, with this attempt being 100% successful.

  3. And the page offset is the same for both virtual and physical address because the page size doesn’t change.