Information Representation

It’s important from time to time to go back to the basics. You most probably already know that computers use binary digits (bits) to represent everything they operate on. And you might be wondering why I’m considering this topic to be “advanced” and “mysterious”. Unfortunately, any simple topic has some mysterious corners that are hard to see and so people will easily omit them. This lack of knowledge will later appear in the form of bugs and security vulnerabilities that are not only difficult to detect, understand, and fix, but also exponentially costlier with respect to time.

Let’s start at the beginning. Any finite sequence of bits, say 10111001, can be represented and manipulated by a computer. However, this sequence of bits per se does not actually represent information because it does not have a specific meaning. It could represent an integer, a real number, a character, a machine-level instruction, a character string, or anything. That’s what we call data. It is just a sequence of symbols that does not have a specific meaning. Even if we know that the two sequences of bits 1010 and 1000 represent numbers, we still cannot add them. That’s because we don’t know how to interpret these bits to construct a number.

This leads to the conclusion that it’s not enough to build a machine that can represent sequences of bits because it will not be able to correctly operate on these bits unless we  build into it a way to interpret them. From this, we can define digital information as bits + interpretation. For the processor to execute an instruction, it must first get the operands from memory then execute the instruction according to the built-in interpretation. In order to get the operands, the processor must have a way to “address” them so that the memory module can locate them. I will focus the rest of the article on addressing and talk about interpretations in later articles. Continue reading