The previous Linux Scheduler for a long time. Replaced very recently by the EEVDF scheduler in the 6.6 kernel, which released late 2023.

CFS tries to implement fairness. This means giving each task a fair share of the CPU. Within some interval of time, if we have tasks running, then each task should receive of the interval.

However, CFS also considers the reality of scheduling, which includes:

  • Time to context switch
  • Time for the scheduler to run
  • Time spent running other, kernel things that don’t really belong to any user threads

Implementation

CFS maintains a current count of how long a task has run called vruntime. The virtual runtimes of all tasks is stored by the scheduler.

  • Stored inside a red-black tree, a self-balancing binary tree
  • Sorted based on the vruntime for each task. Therefore the smallest runtime task is also the leftmost node.
  • A pointer min_vruntime is stored and updated to always be the leftmost node so that lookups are .

A task is ran until there is something with a lower vruntime. To avoid constantly switching back and forth between two tasks with similar runtimes, there is also a configurable granularity (each tasks runs at least 2-3 milliseconds).

New tasks

If new tasks started with a vruntime of 0 then they would simply hog all the CPU runtime until they caught up. Instead, CFS starts new tasks with their runtime equal to min_vruntime. This way fairness is maintained between newer and older tasks.

Good for I/O bound tasks

CFS naturally handles I/O bound (e.g. user input) tasks very well.

This is because when a job is sleeping or blocked, it won’t be scheduled. Therefore they consistently have a low vruntime, so when it does need to run it will be given one of the highest priorities.

Niceness

CFS sets priority by assigning each process a “nice” value. It starts with a nice value of 0 and is bounded between . The higher your nice score, the “nicer” you are, meaning you’ll let other tasks run instead of yourself.

The nice value isn’t automatically updated by the scheduler. A program or (usually) the user adjusts it instead.

A higher nice value means lower priority. A lower nice value means higher priority.

Explaining the v in vruntime and factoring in priority

vruntime stands for “virtual” runtime. It doesn’t literally track the real runtime, but also factors in other attributes like nice values. Consider this pseudocode:

curr_task->vruntime += time_running * some_weight;

This is how priority is factored into runtime: the higher your priority, the lower the overall weight. This results in a lower vruntime increase than a lower priority task with the same actual running time, thus maintaining the higher priority status.

Other attributes that CFS tracks also contribute to the overall weight.