A sequence of bits is meaningless unless we assign it some by interpreting it through some rules. We use data types to specify how exactly binary translates to a specific representation of an object.
We say a particular representation is a data type if there are operations in the computer that can operate on the information encoded in that representation.
Each ISA has its own set of data types, and its own set of instructions that can operate on those data types.
Size
For each data type we need to know its maximum size or length. This is important for many reasons, like knowing when one object ends and another starts.
Another reason is padding. Certain data types rely on the bit value at the “front” or “end” of the binary sequence to tell us something. If we didn’t know the size, then any extra padding of bits will completely fuck shit up.
Integers
If we only had to deal with non-negative numbers then life would be easy: any binary sequence can simply be interpreted as a number by converting from base 2 to base 10.
Issues to solve:
- We also need to represent negative numbers. Mathematical notation tells us to just add a “” in front. But we’re working with computers: everything’s bits.
- Enable easy implementations of circuits for arithmetic operations.
- Consider the maximum representable number and handle integer overflow.
In modern systems, 2’s complement is the de facto integer representation.
Signed magnitude
Idea: use the leftmost bit to indicate sign. A means a positive number. a negative. It’s conceptually simple but there are some big issues.
- A system that with -bit integers can only represent different numbers. That is not ideal because in theory we could support twice as many values.
- Implementing the arithmetic operations in circuits is difficult. For each number, we would need dedicated logic to read the leftmost bit, to check whether it’s positive or negative, and branch from there.
- There exists two representations of zero.
1‘s complement
Predecessor to 2‘s complement. Similar to signed magnitude, let all positive numbers have a leftmost bit value of , and negative numbers have .
- Positive integers are interpreted as base 2. So .
- This also implies our imaginary computer only supports 3-bit integers!
- To obtain , we flip all the sequence of bits that represents positive 5. Therefore .
Given -bit integers, we’re splitting our possible numbers between positive and negative values. This is a much better result than signed magnitude.
The other two problems remain however: designing circuits for them is hard and there are two representations for the number zero: give -bit integers, we have and its “negative” counterpart .
Decimals
In modern systems, Floating point is the de facto standard for decimals.
Fixed point
We represent decimals in two parts, an integer part and a fractional part. They’re separated by a binary point.
Example
Let’s say in a 8-bit fixed point number, we dedicate 3 fractional bits. Then
If the result looks confusing, remember that everything to the right of the binary point is still scaling with the powers of two because we’re using base 2. So , and .
Pros are that representation is similar to that for integers, except for the binary point. Also, the addition and subtraction algorithm is similar to that for integers.
Cons are that it has limited range (dedicated fractional bits for decimals) and precision (1 fractional bit gives precision to within 0.5).
Characters
Encoding the 26 letters of the alphabet requires at least bits.
ASCII
Every character is encoded using an 8-bit value, or a byte. But actually, we only use 7 of those bits for a total of characters.
Unicode
Unfortunately Not every country in the world uses the Latin alphabet so we need support for more than 128 characters.
Unicode uses 32-bit values. This equals symbols. It also encapsulates ASCII as an 8-bit subset to maintain backwards compatibility.