Threads are contained within Processes.
- They’re a sequential execution stream within a process. They’re the actual lines of code running within a process, which serves as a “container” for it.
- Every process has a least one thread running in it.
- In a modern OS like Linux, threads are the unit of scheduling. The Scheduler manages tasks in terms of threads, not processes.
Processes have a unique address space, OS resources, and security attributes. Meanwhile, threads have a unique stack, stack pointer, program counter, and registers. However, threads share the same heap space and globals provided by the process.
Threads behaving as “lightweight” processes
This sorta makes sense. Just like processes, threads can execute concurrently, and can run simultaneously on multiple CPU cores.
Analogy given in class: the kitchen is the processes, threads are the chefs doing the work within the kitchen. Sometimes “adding a chef” is easier than building a new kitchen; sometimes, two kitchens are needed when isolation is required.
Communication across threads
Threads can communicate with each other through variables and memory. This is only possible because they live within the same process.
Recall that up to this point, the only way we’ve learned to achieve inter-process communication is via Pipes, which require opening entirely new FDs and allocating buffer space. This further supports the idea of threads being a “lightweight” process.
However, we have to make sure two threads aren’t concurrently reading and writing to the same data. This can be done with Mutexes.
Furthermore, if two threads are dependent on one another to perform some task, then we’ll need other methods of making sure they’re playing nicely with each other. See Thread concurrency.
POSIX threads API
Creating a thread
Compared to creating new processes, when we create a thread we don’t leave the current process context; instead we allocate new stack space and registers within the process’ address space.
int pthread_create(pthread_t* thread, const pthread_attr_t* attr, void* (*start_routine)(void*), void* arg);
- Creates a new thread into
*thread
.pthread_t
is similar to a file descriptor but for threads. - Returns
0
on success and an error number otherwise
The new thread immediately starts running start_routine(arg)
, where arg
is the list of arguments passed into the function. Just like processes, we leave it to the Scheduler to determine which thread runs next.
Faster than forking
Notably, creating a new thread is faster than creating a new process because we don’t need to allocate an entirely new address space.
Waiting for a thread to terminate
Thread equivalent of waitpid()
. When the parent calls this, it will block and wait for the child thread to exit.
int pthread_join(pthread_t thread, void** retval);
retval
contains the exit status of the terminated thread.
Thread liveness
A set of properties that ensure threads execute in a timely manner, despite e.g. any contention on shared resources. A liveness failure is something that prevents threads from executing. Examples include:
- Deadlock
- Thread starvation
- Forgetting to release a Mutex after being done with a resource
- Mutex recursion: when a thread tries to reacquire a lock that it has already been acquired?
- By default, the thread won’t recognize it already has the lock, so it will essentially block until it releases the lock itself.
- Recursive locks do exist, but are an extreme code smell: avoid using them.
Parallelism
Threads are the perfect candidate for parallelism because they can access and update shared memory. We simply ask each thread to do a small part of the overall work, and run them on separate CPU cores.
Asking processes to do the same thing is a bit more difficult, with IPC and all.
Versus concurrency
Importantly, parallelism occurs when one or more threads are running at the same instant in time.
Two threads might be concurrent but not parallel if only one of them is running at a particular moment in time. They’re both alive and haven’t terminated.
Amdahl’s law
States that the ability for an algorithm to be parallelized is largely restricted by what percentage of it remains sequentially executed.
If a thread can perform 1 unit of work, and represents the percentage of work that we can parallelize, then we can split ‘s work between parallel execution and sequential execution:
Given is the number of threads we can use, and they each equally do a share of the work, we get
We can calculate the speed-up multiplier from running on threads by taking the ratio between them. This is the final formula:
If and we’re running threads, we get a speedup. However, if , we get a speedup.
Takeaway: adding additional threads gives us diminishing returns, and the speedup ratio we get is largely caused by the original percentage we can parallelize, not the number of threads we throw at it.