C++

Introduction

Why C++?

Runs very fast and remains one of the more scalable modern programming languages
Programmer has fine control over memory

Why not C++?

Easy to shoot yourself in the foot with poor memory management
Because you have so much control, it’s easy to mess up and not realize it
Supports many older legacy features that don’t belong in modern programming
Compiler errors can be cryptic sometimes

Basic data types

Also see Data type.

Remember that a byte is defined as 8 bits. (On normal computers that have normal architectures that normal people use.)

int: signed integer, 4 bytes (so 32 bits) and unsigned int: only zero and positive numbers, also 4 bytes
- Can use other, larger integer and unsigned integer types such as int64_t (8 bytes, signed)
float: 4 bytes, double: 8 bytes (so 64 bits) to represent decimal numbers
bool: 1 byte. Can represent a bool with either a 0 or 1
- C++ doesn’t support data sizes smaller than 1 byte. So, the upper 7 bits of bool are unused.
char: ASCII character, 1 byte.
- While the main usage of char is to represent characters, at the end of the day they just specify the size of the datatype, and are actually numbers e.g. the expression 'a' + 6 is legal, 'a' is actually represented by a ASCII code (a number).

There are many different compilers for C++. The three main ones are

MSVC: for Microsoft Visual Studio, one of the more common C++ IDEs, Windows only
GNU Compiler Collection (GCC): works on all UNIX-based systems
- MinGW is the Windows port of GCC
clang works for UNIX and Windows

Compilation

Top-down compilation: must be defined earlier in the code in order to be used.

Forward declarations

// forward declaration
int twice(int);
 
int main() {
  int i = 5;
  i = twice(i);
}
 
// otherwise, wouldn't compile
int twice(int x) {
  return x * 2;
}

The code promises the compiler that an implementation of twice() will be provided later. Allows the code to compile (but only because twice() is actually implemented later).

Header files

Collections of forward declarations. Declarations for classes, functions, constants, etc.

Use #include "name.h" to refer to them. Effectively you’re inserting entire contents of the .h file into your .cpp file when you #include it

Separating the declaration and implementation

The declaration and implementations can be located in separate files. You can have a forward declaration in header.h, implementation in impl.cpp, and then call the function in main.cpp.

Usually call the header and implementation the same filename, e.g. wow.h and wow.cpp.
This will correctly work because the compiler will check other .cpp files if it can’t find the implementation in main.cpp before throwing an error.

Building a project

Each .cpp file is treated as its “own” thing, called a compilation unit or translation unit. By “own” I mean it doesn’t know about any other .cpp files.

This is why we #include "header.h" in each source file. If we want to use a function from another source file, we first forward declare it here.

Each source file is compiled to an intermediary object file, signaled by a .obj file extension. Each object file is linked together, and that’s when we check if the forward declaration in a certain source file is actually implemented somewhere in another object file.

These object files are linked together into one final singular executable.
I assume this is also why we don’t include entire source files in other ones e.g. #include "source.cpp" instead of #include "header.h". They all get linked in the end anyway, so inserting the implementations for every include directive probably takes more time to process than a simple forward declaration.

Preprocessor directives

Include guards

Avoids compiling the same header more than once. There are two ways to accomplish this. #pragma once is the simplest way.

#pragma once
 
float add(float a, float b);

An older method (but guaranteed to be available) is using #ifndef.

#ifndef MATHS_H
#define MATHS_H
 
float add(float a, float b);
 
#endif

What’s happening here is that the first time the preprocessor sees this header file, it will define the macro MATHS_H, because #ifndef MATHS_H is true (the macro has never been defined before because it’s never been processed before).

These macros stay defined unless we explicitly #undef it (which we don’t). The next time we meet this header file again (in another file, or hell, the same one), the #ifndef MATHS_H test will fail, because we did it previously. This essentially comments out the whole file up to the #endif line, which should be the last line in the file. This avoids including it again.

We should always use include guards because the compiler throws an error if it finds two functions, classes, etc. with the same name.

Pass by copy

By default, all data passed into a scope is copied in. This means in a function call, the data is copied into the stack frame, to be used by the callee.

Pass by reference

The ampersand & symbol allows a function to access memory outside of its scope.

void foo(float &a) {
  a = a * 2.f;
}
 
int main() {
  float x = 10.f;
  std::cout << x << std::endl; // 10.000000
  
  foo(x);
  std::cout << x << std::endl; // 20.000000
}

In this example, x first prints out 10.000000, and then 20.000000, because x was modified by foo() since it was passed in by reference. Inside foo(), &a is a reference to the value stored in x, not a copy of it. They both point to the same value.

Readonly access with `const`

If we combine a reference with a const, this allows us to get readonly access to an external variable without having to copy it.

vec2 operator+(const vec2& v);

This allows us to save memory and also not have to worry about whether the function we’re calling modifies the value in some unexpected side effect.

Typically has less memory usage

If you pass variables containing large data structures into functions as a reference, you save a ton of memory stack space, and you don’t waste time copying the data into a new object.

The amount of memory used by a reference is constant. It’s 64 bits on a 64-bit system, and 32 bits on a 32-bit system. If the object we’re referencing is much, much larger, then this saves memory, because a memory address will always take up 64 bits, for example.

Of course, simple types commonly take up less space than 64 bits (for example, integers), so it would be wasting space to use a reference; copying is cheaper here.

References are a general concept

References are not just limited to function headers.

int main() {
  int a = 6;
  
  int &ref = a;
  ref = 10;
 
  std::cout << a; // 10
}

Here, we directly take a reference of another variable in the same function scope. We then modify it, which also changes the value of the original variable, a.

Memory model

The stack

Stores temporary variables. It’s managed by the CPU and has fast read/write. This is where everything is allocated by default in C++.

int main() {
  // allocates memory on the stack for `a`
  vec3 a = vec3(1.f, 3.f, 2.f);
}

Variables are pushed onto the stack in a top-down fashion. When a function creates a variable, it gets pushed onto the stack. When the function exits scope, all variables created from that function are popped from the stack. Also called unwinding the stack.

Memory on the stack is relatively small. We’ll eventually run out of it
Trying to maintain many different variables and keeping them alive in the same scope can become difficult, especially with many function calls.

Passing values across scopes

When a function scope ends, the local variables are popped off the stack, and the memory is freed. So how do we get “out” the value that we’ve calculated within functions?

Returning it

Seems obvious but a return statement pushes the value onto the stack in a very specific place (specified by x86), which the function caller can then read from after the callee has finished.

// regular way
float sum(float a, float b) {
    return a + b;
}

Pointer or reference parameters

Since these two concepts are basically a “window” into an outside scope, we can actually use them to store the return value. For me this was confusing at first, because I’ve always thought of function parameters as a “one-way” door into the function, but that’s not true.

// Pass in a reference to a float from the outside, store the calculated
// value inside of it. Not really recommended... not intuitive
void sum(float a, float b, float &out) {
    out = a + b;
}
 
int main() {
  float result = 0.f;
  sum(5.f, 7.f, result);
  std::cout << result; // 12.f
}
 
// This is better because we have to explicitly dereference the pointer
// to use it or set any values. It helps to differentiate that this is where
// the return value is stored. Also, when we call the function, we have to
// use the special `&` symbol to get a pointer, which also helps with clarity.
void sum(float a, float b, float *out) {
    *out = a + b;
}
 
int main() {
  float result;
  sum(5.f, 7.f, &result);
}

The heap

Much, much larger than the stack. Usually slower to read/write, and in C++ allocating and deallocating memory on the heap is not automatic.

How to leak memory

It’s fun!

void leak_memory() {
    float *pi = new float(3.1416);
}
 
int main() {
    for (int i = 0; i < 10e100; ++i) {
        leak_memory();
    }
    
    return 0;
}

While the pi variable is removed from the stack every time the function call to leak_memory() ends, the new keyword allocated memory on the heap, which is never deallocated. This causes a memory leak because the pointer (via pi) to that memory has now gone out of scope, meaning we’re never able to access it again.

Heap allocation

There are many ways to allocate memory on the heap. Older methods like the new operator or straight up malloc() are heavily discouraged. See Dynamic memory allocation in C.

Modern C++ relies on RAII and smart pointers.

Resource acquisition is initialization (RAII)

Coined by Bjarne Stroustrup in the mid 80s. This is the idea that the object that instantiates some memory should free that memory when the time comes.

Smart pointers

C++ provides two standard library classes to help us access and manage heap memory. Smart pointers help automatically deallocate the heap memory so we don’t have to remember.

`std::unique_ptr`

When a unique_ptr is created, it also instantiates some heap memory for the object it wraps. When a unique_ptr is destroyed, it also frees that same heap memory, automatically.

Every C++ class has a copy constructor by default. However, std::unique_ptr explicitly disables its copy constructor. This makes sense, because only one object is allowed to own and control this memory on the heap.

Accessing the underlying pointer

It’s trivial to get a pointer to the object that the unique_ptr is managing by calling get(). This allows you to use the object in functions or APIs that may not expect the smart pointer wrapper.

Note that C++ does not stop you from manually tampering with it. It is valid C++ code to call delete on this raw pointer, which will deallocate the heap memory. However, the unique_ptr does not know this happened, and when its lifetime ends, it will try to delete again. Double deletion is undefined behavior and will probably crash the program.

`const` all the things

Besides making references immutable, the const keyword can be used anywhere to make a variable immutable, and to make functions not be allowed to modify class member variables.

Except for object data members

Marking data members const implicitly causes the compiler to delete all assignment operators and disables move semantics.

Assignment operators being deleted is obvious because you can’t reassign fields.
Move operators are deleted because the compiler can’t guarantee that the fields of the original object aren’t modified after being moved from.

Furthermore, it forces you to initialize members during construction in the member initialization list (assigning values in the constructor body doesn’t work because the operator has been deleted).

struct SomeData {
    const int x;
    const double y;
 
    /// Can only be initialized in the member init list.
    SomeData(int x, double y) : x(x), y(y) {
        // Not even allowed to do this! Compiler error!
        // x = 3;
    }
}

You might want to do this if you’re simply grouping a bunch of data together in an object but want to prevent the receiver from modifying the fields. However, this is the wrong approach because a lot of the C++ Standard Library uses assignment and move semantics internally.

#include <utility>
#include <vector>
#include <ranges>
 
void do_work() {
    SomeData s1(1, 2.0);
    SomeData s2(3, 4.0);
 
    // Fails to compile! std::swap uses assignment *and* move semantics!
    std::swap(s1, s2);
 
    // Also fails to compile!
    std::vector<SomeData> much_data_wow;
    std::ranges::sort(much_data_wow);
}

The implication of const or reference member variables in C++: examples come from here.

`constexpr` and `consteval`

Builds on the idea of immutability and makes the values known at compile time instead of runtime. Intuition: why calculate a value every time we run the program when it is possible to calculate it once when we are compiling the program?

constexpr square(int n) {
    return n * n;
}
 
int main() {
    constexpr int x = 20;
 
    // We have a choice of marking `wow` as `constexpr` or not.
    // If not, then the value (which may or may not be calculated at
    // runtime) is loaded during runtime.
    //
    // If it's `constexpr` like here, then the function is forced to run
    // during compile time, and this variable's value is known as well.
    constexpr int wow = square(x);
}

constexpr functions can both run during compile time or runtime when its function parameters aren’t compile time known, or its result is not stored in a constexpr variable. consteval forces the function to run only during compile time; it might as well not exist during runtime.

C++ constexpr makes compile-time programming a breeze: good overview of what’s possible.

Operator overloading

A lot of operators can be overloaded and a lot of characters are considered operators.

Comma operator (`,`)

The comma character , is an operator and can be overloaded. By default, the operator evaluates the expressions on both sides and returns the right-hand side. In code:¹

// This...
auto result = (left->right, right->left);
 
// ...is equivalent to:
left->right;
auto result = right->left;

Taken from this Reddit post. ↩

All Notes

otherworld

Introduction

Why C++?

Why not C++?

Basic data types

Compilation

Forward declarations

Header files

Separating the declaration and implementation

Building a project

Preprocessor directives

Include guards

Pass by copy

Pass by reference

Readonly access with `const`

Typically has less memory usage

References are a general concept

Memory model

The stack

Passing values across scopes

Returning it

Pointer or reference parameters

The heap

How to leak memory

Heap allocation

Resource acquisition is initialization (RAII)

Smart pointers

`std::unique_ptr`

Accessing the underlying pointer

`const` all the things

Except for object data members

`constexpr` and `consteval`

Operator overloading

Comma operator (`,`)

Graph View

Table of Contents

All Notes

C++

Introduction

Why C++?

Why not C++?

Basic data types

Compilation

Forward declarations

Header files

Separating the declaration and implementation

Building a project

Preprocessor directives

Include guards

Pass by copy

Pass by reference

Readonly access with const

Typically has less memory usage

References are a general concept

Memory model

The stack

Passing values across scopes

Returning it

Pointer or reference parameters

The heap

How to leak memory

Heap allocation

Resource acquisition is initialization (RAII)

Smart pointers

std::unique_ptr

Accessing the underlying pointer

const all the things

Except for object data members

constexpr and consteval

Operator overloading

Comma operator (,)

Footnotes

Graph View

Table of Contents

Readonly access with `const`

`std::unique_ptr`

`const` all the things

`constexpr` and `consteval`

Comma operator (`,`)