There are three types of memory allocation that a program can do.

  • We can statically allocate memory by initializing variables in the global scope. Memory is allocated when the program starts, and deallocated when it exits.
  • Stack allocation occurs when we initialize local variables, like inside a function. These are pushed to the stack, and deallocated when the function returns.
  • Dynamic allocation. Occurs on the heap. See below.

Dynamic memory allocation in C

Occurs at runtime during program execution, and in C requires us to manually manage the memory that we request. (By freeing it at the end.)

The following functions are not syscalls. They’re part of the C standard library, and internally invoke system calls to expand/shrink the heap.1

Requesting memory

void* malloc(size_t size);

Searches for a large enough unused block of memory, marks that memory as allocated, and returns a pointer to the beginning of that memory.

Freeing memory

void free(void* ptr);

Deallocates the memory pointed to by ptr. The pointer must point to the first byte of heap-allocated memory (i.e. the pointer returned by malloc()).

Defensive programming

The actual value of the pointer we just freed is not modified after calling free(). Therefore, we can be a bit defensive and manually set the pointer to NULL after freeing it.

float* arr = malloc(10 * sizeof(float));
free(arr);
 
// Manually set `arr` to NULL because it still holds the value of an address,
// except it's now invalid to use it for anything
arr = NULL;

Fragmentation

When storage (like memory in this case) is used inefficiently, which can hurt performance and prevent the ability to allocate “unused” memory.

  • External fragmentation occurs when free memory is spread out over small portions that can’t be combined into a bigger block.
  • Internal fragmentation occurs when more space is allocated than actually used. This can happen intentionally or not.
    • Intentionally, we may allocate 4096 bytes but only use one.
    • Unintentionally, malloc() may allocate more bytes than requested because we want addresses to be memory aligned e.g. allocate 8 bytes when 7 was requested

Implementing dynamic memory allocation

We need to track which parts of the heap we’ve allocated and which parts are available. Furthermore, we have to guarantee that the data is contiguous within a single allocation. There’s a variety of data structures and algorithms to do this.

Memory allocation is not a free operation.2 Avoid unnecessary allocations.

Free lists

Maintain an implicit list of space available and space allocated. In addition, we store some metadata before each chunk of allocated and free memory:

// Much simplified
struct alloc_info {
    alloc_info* prev;
    alloc_info* next;
    bool allocated;
    size_t size;
}

The actual “free list” is simply then:

// Always points to the first free chunk
alloc_info* free_list;

Whenever an allocation is requested, we split the list into “chunks” that are the size that was requested, plus additional memory for the metadata. We then manipulate the prev and next pointers to connect the list.

  • malloc() would simply return the first address after the metadata of the newly allocated chunk.
  • When a pointer is freed, we simply set allocated to false so that we can reuse it in the future.
  • To perform additional allocations, we walk free_list and stop at the first chunk that is both unallocated and large enough.

Coalescing

Once a block has been freed, we can try “coalescing” it with its neighbors (combining previous/next pointers and headers).

However, free lists often end up with external fragmentation that can’t be combined together anymore.

Allocation policies: first, best, worst fit

We don’t necessarily have to pick the first available block that is large enough (this is called first fit).

  • Best fit: search for a portion of memory that has the “best” or “tightest” fit e.g. if requesting 4 bytes, look for the smallest block that is bytes.
  • Worst fit: same as best fit but search for the largest block that is bytes. Might be unintuitive but this may minimize external fragmentation over the long run.

Arena allocation

Useful when we want to allocate many items and don’t mind keeping them within the same arena/region/pool3 of the same scope.

  • Very basically, store the total size allocated, a start_ptr and a end_ptr
  • For each allocation request, return the current value of end_ptr for the user to use and simply move end_ptr to where the next free address resides
  • Once we’re done with our temporaries, we can “free” (clear) all allocations by simply resetting end_ptr to equal start_ptr again

Instead of just being scoped to a function, arena allocators can also be scoped to an “object” or the lifetime of a task.

Typically all the memory that will be used is allocated when initialized and not touched again, because we have a good idea of how much memory our current task’s “scope” requires.

Buddy algorithm

Keeps in mind of some “maximum” amount of memory and repeatedly divides memory into partitions that are powers of 2.

Modified versions of this algorithm powers Linux physical page allocations and a famous implementation of malloc() called jemalloc.

  • Powers of 2 allows for compact allocation tracking and makes coalescing memory faster.
  • Usually the smallest “unit” is 1 page, or 4096 bytes.

On an allocation request, we split our memory into equal halves until we reach a size that fits the request. If a split chunk that fits the request already exists, we skip the work and use that instead.

  • Odd-numbered page allocation requests are rounded up (e.g. requesting 3 pages will find, mark, and allocate pages instead).
  • Each chunk has a “buddy” which is the “twin” that was created when we split a chunk in half. If both itself and its buddy are free, they will be combined.

External fragmentation is usually minimal. However, a significant downside is that internal fragmentation may be rampant because every allocation needs to be a power of 2. Also, a lot of smaller allocations is usually pretty expensive and not suited for this allocator.

Implementation

We can use binary search tree to maintain allocation information. Each node tracks whether it’s been split, allocated, or free. Furthermore, the BST can be implemented using an array or bitmap. In the example below, the “max” number of pages is .

Footnotes

  1. Internally they use brk(), sbrk(), mmap(), and munmap() to manipulate the process’ memory. These are Linux syscalls.

  2. Apart from obvious reasons, something that I never thought about is the fact that we’ll also need Mutexes in multi-threaded code. Just to call malloc()!

  3. Which is what this allocator is often named instead. Also called a bump allocator, because we “bump” end_ptr further along as more stuff is allocated.