Given an IR program (e.g. LLVM) that uses an unbounded number of temporaries (e.g. UIDs), we want to find a mapping of temporaries to machine registers such that

  • program behavior is the same, obviously
  • register use is maximized
  • stack space and moves between registers are minimized
  • calling conventions and ISA requirements are followed

When there are registers available and live temporaries at a given execution point, we spill excess temporaries onto the stack. This is called stack spilling.

Greedy linear-scan algorithm

  1. Compute liveness information for each UID. That is, compute a live(x) that returns the set of UIDs that are live on entry to UID x‘s definition
  2. Let pal be the set of usable registers. We reserve rax and rcx for the spill code e.g. keeping track of which stack slot we’re currently on.
  3. Initialize a map uid_loc that maps UIDs to a location. A location can be a register or a stack slot, starting with n=0.
  4. Linearly iterate through every instruction in the program. For each instruction that defines a UID x, we find the set used of currently used locations from live(x). For each UID y in live(x), find the location r it is using with uid_loc(y).
  5. Calculate available = pal - used, the set of available locations. If available is empty, assign uid_loc(x) := slot n, and increment n. Otherwise, pick a location r from available and assign uid_loc(x) := r.

Using an interference graph

An interference graph is constructed by creating one node for each UID, and an edge exists between two nodes if UID x and y were live at the same time at any point during program execution.

Then, we can treat register allocation as a graph coloring problem: we attempt to color the interference graph, where each color represents a different register/stack slot.

Each pair of nodes connected by an edge must have a different color. Equivalently, two UIDs that will be live at the same time at some point will be allocated to different registers.

Basic process

  1. Compute the liveness information for each temporary.
  2. Create the interference graph for the program.
  3. Try to color the graph. If we fail, then increase the number of locations we use, e.g. spill a register to the stack. We can do this by incrementing the number of stack slots we use. Try again. (In practice we don’t actually do this. Read below.)
  4. Rewrite the program to use the given locations.

However, there are some issues we need to solve. Finding the -coloring of a graph is NP-complete. We still need to assign registers to the colors we created. Also, how do we effectively assign stack slots once the register spills over.

Kempe’s algorithm

Efficient approximation of -coloring a graph. It doesn’t completely work, but that is actually fine for our use case.

  1. Find a node with edges and cut it out and its edges out of the graph. This is called simplifying the graph.
  2. Recursively -color with the remaining subgraph.
  3. When the remaining graph is colored, there must be at least one free color available for the deleted node (because we specifically picked a node with ). Pick this color.
  4. Assign colors in the reverse direction of the nodes we iteratively removed from the graph.

When the algorithm fails

If the simplified graph has all nodes with at least neighbors, this algorithm breaks down (we cannot simplify any further). This means we can’t -color the graph.

  • This means we need to store a UID on the stack.
  • In the execution of the algorithm, if we get stuck at the subgraph, we simply pick one of the nodes and mark it as spilled. Then we remove it from the graph, and recurse onwards.

However, sometimes we are actually able to color the node that we marked for spilling.

Here, x can actually be colored yellow after it was marked for spilling, after it was removed from the graph. So while we run the algorithm, mark the node for spilling, but don’t perform spilling yet.

Precoloring nodes

Some variables must be pre-assigned to registers, so we treat registers as actual nodes in interference graph with assigned colors, and run the algorithm the same. We also set the degree to essentially infinity, so it is never removed.

While we could assign any color to the node and have it be technically correct, we can also choose colors more carefully to improve performance. For example, movq r1 r2 essentially moves a value between two temporaries. If r1 and r2 are colored the same color, then this instruction is useless and can be eliminated. So we should aim to do that.

To do this, we can also connect nodes in the graph with a phantom “move-related” edge if two nodes are involved in an instruction like movq. Then when it comes time to assign colors, we should aim to assign the two nodes the same color.

Accessing spilled registers

To do this, we need to generate code to move spilled temporaries in and out of the stack whenever we want to use it. We can reserve two registers (for us, rax and rcx) to use whenever we need to perform the operation.

On x86-64, we have enough registers to pull this off. On 32-bit, however, this severely limits the number of registers for actual allocation and would be a worse idea.