Introduction
Why C++?
- Runs very fast and remains one of the more scalable modern programming languages
- Programmer has fine control over memory
Why not C++?
- Easy to shoot yourself in the foot with poor memory management
- Because you have so much control, it’s easy to mess up and not realize it
- Supports many older legacy features that don’t belong in modern programming
- Compiler errors can be cryptic sometimes
Basic datatypes
Remember that a byte is defined as 8 bits.
int
: signed integer, 4 bytes (so 32 bits) andunsigned int
: only zero and positive numbers, also 4 bytes- Can use other, larger integer and unsigned integer types such as
int64_t
(8 bytes, signed)
- Can use other, larger integer and unsigned integer types such as
float
: 4 bytes,double
: 8 bytes (so 64 bits) to represent decimal numbersbool
: 1 byte. Can represent abool
with either a0
or1
- C++ doesn’t support data sizes smaller than 1 byte. So, the upper 7 bits of
bool
are unused.
- C++ doesn’t support data sizes smaller than 1 byte. So, the upper 7 bits of
char
: ASCII character, 1 byte.- While the main usage of
char
is to represent characters, at the end of the day they just specify the size of the datatype, and are actually numbers e.g. the expression'a' + 6
is legal,'a'
is actually represented by a ASCII code (a number).
- While the main usage of
There are many different compilers for C++. The three main ones are
- MSVC: for Microsoft Visual Studio, one of the more common C++ IDEs, Windows only
- GNU Compiler Collection (GCC): works on all UNIX-based systems
- MinGW is the Windows port of GCC
- clang works for UNIX and Windows
Compilation
Top-down compilation: must be defined earlier in the code in order to be used.
Forward declarations
The code promises the compiler that an implementation of twice()
will be provided later. Allows the code to compile (but only because twice()
is actually implemented later).
Header files
Collections of forward declarations. Declarations for classes, functions, constants, etc.
- Use
#include "name.h"
to refer to them. Effectively you’re inserting entire contents of the.h
file into your.cpp
file when you#include
it
Separating the declaration and implementation
The declaration and implementations can be located in separate files. You can have a forward declaration in header.h
, implementation in impl.cpp
, and then call the function in main.cpp
.
- Usually call the header and implementation the same filename, e.g.
wow.h
andwow.cpp
. - This will correctly work because the compiler will check other
.cpp
files if it can’t find the implementation inmain.cpp
before throwing an error.
Building a project
Each .cpp
file is treated as its “own” thing, called a compilation unit or translation unit. By “own” I mean it doesn’t know about any other .cpp
files.
- This is why we
#include "header.h"
in each source file. If we want to use a function from another source file, we first forward declare it here.
Each source file is compiled to an intermediary object file, signaled by a .obj
file extension. Each object file is linked together, and that’s when we check if the forward declaration in a certain source file is actually implemented somewhere in another object file.
- These object files are linked together into one final singular executable.
- I assume this is also why we don’t include entire source files in other ones e.g.
#include "source.cpp"
instead of#include "header.h"
. They all get linked in the end anyway, so inserting the implementations for every include directive probably takes more time to process than a simple forward declaration.
Preprocessor directives
Include guards
Avoids compiling the same header more than once. There are two ways to accomplish this. #pragma once
is the simplest way.
An older method (but guaranteed to be available) is using #ifndef
.
What’s happening here is that the first time the preprocessor sees this header file, it will define the macro MATHS_H
, because #ifndef MATHS_H
is true (the macro has never been defined before because it’s never been processed before).
These macros stay defined unless we explicitly #undef
it (which we don’t). The next time we meet this header file again (in another file, or hell, the same one), the #ifndef MATHS_H
test will fail, because we did it previously. This essentially comments out the whole file up to the #endif
line, which should be the last line in the file. This avoids including it again.
We should always use include guards because the compiler throws an error if it finds two functions, classes, etc. with the same name.
Pass by copy
By default, all data passed into a scope is copied in. This means in a function call, the data is copied into the stack frame, to be used by the callee.
Pass by reference
The ampersand &
symbol allows a function to access memory outside of its scope.
In this example, x
first prints out 10.000000
, and then 20.000000
, because x
was modified by foo()
since it was passed in by reference. Inside foo()
, &a
is a reference to the value stored in x
, not a copy of it. They both point to the same value.
Readonly access with const
If we combine a reference with a const
, this allows us to get readonly access to an external variable without having to copy it.
This allows us to save memory and also not have to worry about whether the function we’re calling modifies the value in some unexpected side effect.
Typically has less memory usage
If you pass variables containing large data structures into functions as a reference, you save a ton of memory stack space, and you don’t waste time copying the data into a new object.
The amount of memory used by a reference is constant. It’s 64 bits on a 64-bit system, and 32 bits on a 32-bit system. If the object we’re referencing is much, much larger, then this saves memory, because a memory address will always take up 64 bits, for example.
Of course, simple types commonly take up less space than 64 bits (for example, integers), so it would be wasting space to use a reference; copying is cheaper here.
References are a general concept
References are not just limited to function headers.
Here, we directly take a reference of another variable in the same function scope. We then modify it, which also changes the value of the original variable, a
.
Memory model
The stack
Stores temporary variables. It’s managed by the CPU and has fast read/write. This is where everything is allocated by default in C++.
Variables are pushed onto the stack in a top-down fashion. When a function creates a variable, it gets pushed onto the stack. When the function exits scope, all variables created from that function are popped from the stack. Also called unwinding the stack.
- Memory on the stack is relatively small. We’ll eventually run out of it
- Trying to maintain many different variables and keeping them alive in the same scope can become difficult, especially with many function calls.
Passing values across scopes
When a function scope ends, the local variables are popped off the stack, and the memory is freed. So how do we get “out” the value that we’ve calculated within functions?
Returning it
Seems obvious but a return statement pushes the value onto the stack in a very specific place (specified by x86), which the function caller can then read from after the callee has finished.
Pointer or reference parameters
Since these two concepts are basically a “window” into an outside scope, we can actually use them to store the return value. For me this was confusing at first, because I’ve always thought of function parameters as a “one-way” door into the function, but that’s not true.
The heap
Much, much larger than the stack. Usually slower to read/write, and in C++ allocating and deallocating memory on the heap is not automatic.
How to leak memory
It’s fun!
While the pi
variable is removed from the stack every time the function call to leak_memory()
ends, the new
keyword allocated memory on the heap, which is never deallocated. This causes a memory leak because the pointer (via pi
) to that memory has now gone out of scope, meaning we’re never able to access it again.
Heap allocation
There are many ways to allocate memory on the heap. Older methods like the new
operator or straight up malloc()
are heavily discouraged. Modern C++ relies on RAII and smart pointers.
Resource acquisition is initialization (RAII)
Coined by Bjarne Stroustrup in the mid 80s. This is the idea that the object that instantiates some memory should free that memory when the time comes.
Smart pointers
C++ provides two STD classes to help us access and manage heap memory. Smart pointers help automatically deallocate the heap memory so we don’t have to remember.
std::unique_ptr
When a unique_ptr
is created, it also instantiates some heap memory for the object it wraps. When a unique_ptr
is destroyed, it also frees that same heap memory, automatically.
Every C++ class has a copy constructor by default. However, std::unique_ptr
explicitly disables its copy constructor. This makes sense, because only one object is allowed to own and control this memory on the heap.
Accessing the underlying pointer
It’s trivial to get a pointer to the object that the unique_ptr
is managing by calling get()
. This allows you to use the object in functions or APIs that may not expect the smart pointer wrapper.
Note that C++ does not stop you from manually tampering with it. It is valid C++ code to call delete
on this raw pointer, which will deallocate the heap memory. However, the unique_ptr
does not know this happened, and when its lifetime ends, it will try to delete
again. Double deletion is undefined behavior and will probably crash the program.
w