Bit, Bytes and Pixels

Intro

I've been thinking about writing this post for a while and some recent work optimizing an e-ink library reminded me to get started. A common theme I see in a lot of the (other people's) code that I optimize is that simple bit/byte/pixel operations are turned into overly complicated code with lots of compares and branches. The authors tend to tie themselves in knots to write working code, but the common theme is that they don't really understand how computers do their work. They approach the topic of bit manipulation as if each bit has a life of its own and is not somehow related to the other bits in the variable.

Computers these days are super complicated machines, but fundamental parts of the design haven't changed from the earliest 4-bit digital computer to today's multi-core supercomputers. The part I'm talking about is the concept that all digital data is defined as different sized groups of bits; a bit is just an on (high) or off (low) signal on a single wire. If every bit has 2 possible states and all digital data is just groupings of these bits, then that naturally means computers are binary math (base 2) machines. This is the most likely point where the disconnect happens - if computers are binary math machines, how can they work with integers and floating point numbers in decimal notation?

Value vs Representation

I guess it's time to discuss number bases. Humans normally work with base 10 numbers (10 unique symbols - 0 through 9). This makes logical sense because our hands have a total of 10 fingers. There's nothing special about base 10 and if it weren't for our physical form, it would be a completely arbitrary choice. The same can be said for base 2 numbers; there's nothing inherently valid or invalid about using a different number base. The thing that tends to confuse people is that the same value can be represented in multiple ways. A binary value of 10100001 in base 10 (decimal) is 161 or in base 16 (hexadecimal) is 0xA1. Those three representations of the same number look quite different. There's a good reason that these three number bases are commonly used in programming. The base 2 representation is a logical way of representing a collection of bits (wires carrying digital signals). These have logical groupings (e.g. byte = 8 bits, short = 16 bits, int = 32 bits). A shorthand for programmers to avoid having to type so many binary digits is to use hexadecimal (aka hex). Hex digits are really just groups of 4 bits. Since there are 16 possible values of 4 bits, the hexadecimal symbols use numbers and letters (0123456789ABCDEF).

Visualizing Data

In order to work effectively with bits in computer software, it's necessary to visualize how they are arranged. There is a standard way of describing the order of bits and bytes.

Bit order

The bit order is normally depicted with the most significant bit (highest place value) on the left and the LSB on the right. This gives meaning to "logical shift left" or "logical shift right" operations. In several computer programming languages, these operations are done using the symbols '<< ' for left shift and '>>' for right shift operators. A shift left moves bits to the left and inserts 0's in the vacated spots. The effect is to double the value for each bit shifted left. Here's an example:

00010001 = 17 = 0x11 when shifted left 1 place becomes 00100010 = 34 = 0x22

The same effect happens in any number base. A shift left multiplies by the base value. In base 10 if we have the number 321 and shift it left one place (3210), we've multiplied it by 10. From this idea it would logically follow that shifting bits right will divide the value by 2 for each bit shifted. At this point it's necessary to give you some homework. Working effectively with bits, bytes etc requires you to have the first few powers of 2 memorized so that you can recognize when bits line up nicely or can be masked or shifted with ease. It also helps to know what they look like in hex. Here's what I mean:

2^0 = 1 = 0x0001 2^8 = 256 = 0x0100

2^1 = 2 = 0x0002 2^9 = 512 = 0x0200

2^2 = 4 = 0x0004 2^10 = 1024 = 0x0400

2^3 = 8 = 0x0008 2^11 = 2048 = 0x0800

2^4 = 16 = 0x0010 2^12 = 4096 = 0x1000

2^5 = 32 = 0x0020 2^13 = 8192 = 0x2000

2^6 = 64 = 0x0040 2^14 = 16384 = 0x4000

2^7 = 128 = 0x0080 2^15 = 32768 = 0x8000

Byte order / Endianness

In addition to knowing how bits are ordered in bytes/shorts/etc, it's also necessary to visualize how bytes are ordered on your particular CPU. There are two ways that bytes are arranged in memory, big and little endian. The terms indicate if the data starts at the "big" end or "little" end, meaning the most significant byte or least significant byte. The image below shows how the bytes are arranged in each case for the value 0x0A0B0C0D. In the big endian arrangement, the most significant byte (0x0A) is stored at the lowest address and in the little endian arrangement it's stored at the highest.

The vast majority of CPUs in the world use the little endian order (e.g. x86, ARM, AVR). It's important to be able to visualize this when writing software. For example, let's say you're reading the FourCC value from a video file. In a hex editor, you see that the value should be: 0x4D 0x4A 0x50 0x47 (the letters MJPG for Motion-JPEG). If you read the 4 bytes into a variable like so:

fread(&u32, 1, sizeof(uint32_t), my_file_handle);

The value in u32 will be 0x47504a4d (GPJM) since on a little endian computer, the least significant byte is stored first. It's easy enough to reverse the bytes, but knowing how they're stored in memory can save you a lot of grief in situations like that. Most modern CPUs have native instructions to reverse the order in a single step. Instead of doing this:

u32 = (u32 >> 24) | ((u32 >> 8) & 0xff00) | ((u32 << 8) & 0xff0000) | (u32 << 24);

You can use this:

u32 = __builtin_bswap32(u32);

On ARM, X86 and many other CPUs this turns into a single instruction.

Practical Example

Here is some code to demonstrate the hard versus easy way to accomplish some bit manipulation. Knowing what I've described above can save you a lot of head scratching and inefficient code. I discovered this bit of code while working on software for e-paper displays. The original library (shown below) was provided by the vendor of the display.

The code above is splitting off the even and odd bits of each byte so that they can be written separately to the e-paper controller. The clue that indicates that the author isn't visualizing the bits properly is that each pair of bits (correctly masked with 0xC0) from the source byte is being compared for its 4 possible values instead of just testing the bit in question with a single AND. The odd (0x80) and even (0x40) bits being masked off (0xc0) tell the whole story. There is a more efficient way to do this which avoids looping (a few ANDs and ORs), but if we just improve the logic above, we would get better results. If we look carefully at these lines:

temp2 = temp1 & 0xc0;
if (temp2 == 0xc0)
temp3 |= 1;
else if (temp2 == 0x00)
temp3 |= 0;
else if (temp2 == 0x80)
temp3 |= 1;
else // 0x40
temp3 |= 0;

Forgetting for a moment about the lines which don't actually do anything (temp3 |= 0), we can see from the comparison that if bit 7 (0x80) is set, then the variable temp3 should get it's LSB set. A simpler way of writing all of the code above is simply:

if (temp1 & 0x80)
temp3 |= 1;

This is what I was referring to when I said that some people tie themselves in knots to manipulate bits and they would have a much easier time if they could visualize the data correctly. How did our specific knowledge of bits help us? Well, we need to split each byte into its even and odd bits. We can directly test for an odd bit being set instead of comparing all 4 possible values of the even/odd bit pair. I'm always going to advocate for more efficient code, but the point here is really about the simplicity of it. Now... to accomplish the same task without any compares nor branches, there's much more efficient way to split off even and odd bits. This code below accomplishes the same job as the code shown above.

Pixels

There is a lot of ground to cover when talking about computer graphics. My aim for this section is to just plant the idea that when you're looking at a graphics display, you're looking at bits (memory). I'm going to use the ubiquitous SSD1306 128x64 OLED display to make my point. The display has 1024 bytes of internal RAM which are used to generate the 128x64 pixels. Each pixel is either on or off (1 bit), so 128x64 = 8192 pixels/bits = 1024 bytes * 8 bits per byte (have you memorized the powers of 2 yet? 😏). You're literally looking directly at RAM when you look at that display. Each dot is a single bit of memory.

The memory is arranged in 8 rows of 128 bytes. The bytes are oriented vertically so that the LSB (least significant bit) is the top line of each row and the MSB is the bottom line.

In the image above, each pixel (bit) is a yellow dot and the 'page' represents a row of 128 bytes (128 x 8 pixels). In order to light up the top, left dot, you would write the value 1 into byte index 0. Here's a way to express that in code - this figures out which bit to set based on the line (Y value):

ucBit = (1 << (y & 7));

And this calculates the byte offset (assume that top left is offset 0 and bottom right is 1023)

iOffset = (y >> 3) * 128 + x;

Let's unpack the code.

The Y value is masked with 7 (y & 7) to get the bit number (0-7) within the row. The reason we can use logical AND is because the number of bits in a byte (8) is a nice power of 2. If the value of y is 9, then it will represent the second bit of the second row (9 & 7 = 1). We shift 1 left (1 <<) the number of bits specified by this masked value.

To calculate the byte offset (address), we can take the y value and shift it right 3 places (y >> 3). This divides y by 8 which tells us which byte row the pixel lands on. The row is then multiplied by 128 (bytes per row) and added to the column (x) since each column is a whole byte away from the previous. If we keep a copy of the display in our own memory, then a SetPixel() function can look like this:

void SetPixel(uint8_t *pRAM, int x, int y)
{
int index = (y >> 3) * 128 + x;
uint8_t ucBit = (1 << (y & 7));

pRAM[index] |= ucBit; // logical OR the bit mask to set the pixel
}

Hopefully this post hits the right level of not too easy and not too geeky for you to absorb the info. As always, if you have any questions please leave them in the comments so everyone can benefit from the conversation.

Search This Blog

Follow me down the optimization rabbit hole