Home‎ > ‎Articles and Publications‎ > ‎Articles (ENG)‎ > ‎

Numerical representation and information coding

by A. Calderone 
[Beginner Level]

We are used to represent numbers as a sequence of symbols or digits. Each digit has a different "weight" depending on the position it occupies in the number. For example, the number 123 is different from the number 321 even if the digits used to represent both numbers are the same. There is an equivalent way to write the two numbers that highlights the weight factor, as shown in the following expressions: 

123 = 1 x 102 + 2 x 101 + 3 x 100  
321 = 3 x 102 + 2 x 101 + 1 x 100.  


The second notation is at least more uncomfortable than first one, in fact, we normally use represent numbers in the first way, but taking into account the effective weight of each digit.
Looking at the two expression we can deduce easily that each digit has weight 10n, where 10 is the base and n the position of the digit in the number, counted from right to left (first position equals to n=0).
We know this number system with the name of the decimal system or base ten, just because each number can be represented by combining together the ten symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.


Binary system

The adoption of the binary system for the representation of numbers comes the intimate nature of the mechanisms that allow computers to perform arithmetic and logic calculations.

The binary system uses only two digits to represent numbers, namely 1 and 0.
The sequence of these digits allows to represent any number.

Consider, for example, the binary number "101011"
Proceeding as in the first example we assign to each place occupied by digit a weight, which in this case is 2n , where n is the position of each bit from right to left.

Then the binary number 101011 is equivalent to decimal expression 
1 x 250 x 24 1 x 23 0 x 22 1 x 21 1 x 20= 32+8+2+1 = 43 in base 10.

The reverse conversion can also be done in a simple way. One method is to write down the decimal number and to continually divide-by-two to give a result and a remainder of either a "1" or a "0", until the final result equals zero: 

43 / 2 =21 r=1
21 / 2 =10 r=1
10 / 2 = 5 r=0
5 / 2 = 2 r=1
2 / 2 = 1 r=0
1 / 2 = 0 r=1

Now reading the remainder of each division from the bottom upwards we get the binary number 101011.


Nibble, Byte, Word ...

Internal computer representation of a number is done by adopting the binary system.
Binary numbers are managed in specific formats, or better 
packed in a sequence of bits (depending also on the calculation capacity of the computer).

These packets of bits are generally grouped and named as follows:

  • 1 nibble = 4 bits
  • 1 byte = 2 nibbles = 8 bits
  • 1 word = 2 bytes = 4 nibbles = 16 bits
  • 1 double word = 2 words = 4 bytes = 8 nibbles = 32 bits
  • 1 quadword = 2 double words = 4 words = 8 bytes = 16 nibbles = 64 bits

Note that the number of bits of each format is always proportional to 4.
Among the formats listed the most widely used is probably the byte.
The term "packed" derives from the fact that a number can be represented with a byte always consists of 8 bits. For example, zero packed in a single byte is rewritten with a sequence of 8 bits set to 0: 00000000.

The binary number 101011 seen in the example above is packed in a byte as 00101011. While adopting the word format the number becomes 0000000000101011. It would not make sense instead pack it in a nibble.

If you decide to use a byte to represent a subset of the natural numbers, you can only represent 256 (28) numbers, i.e. from the number 00000000 (0d) to 11111111 (255), because a byte consists of 8 bits, and therefore the largest number, obtainable by combining the two digits 0 and 1 in the eight possible positions is 28-1=255.

Using the same criteria we can represent, using a word 65,536 (216) different numbers and 4,294,967,296 (232) numbers using a double word. Just 16 (24) with a nibble.
The general rule is then the following: using a n-bit word you can represent 
2n numbers.

Two other frequently occurring terms in computer terminology are MSB and LSB.

MSB (More Significant Bit or Most Significant Bit) is the term which often is referred to the first bit of a number represented in one of the formats listed above, and likewise the last bit is called LSB (Less Significant Bit or Least Significant Bit). For example, in the number 00101011, MSB = 0 and LSB = 1.


Hexadecimal System

Although computers deal with bits, humans are not always able to do this in an easy manner. The most natural way of working, for us, is to treat the numbers in decimal format. 

However it would be handy to have a number system that would ensure a correspondence between a certain number of bits and its own digit format. 

For example, referring to nibble, let's look at this table:

0000 0 1000 8
0001 1 1001 9
0010 2 1010 10
0011 3 1011 11
0100 4 1100 12
0101 5 1101 13
0110 6 1110 14
0111 7 1111 15

Note that from 0 to 9, the 4 bits correspond to a single decimal digit, from 10 to 15 in two separate digits.
Now let's rewrite the above table as follows.

0000 0 1000 8
0001 1 1001 9
0010 2 1010 A (10)
0011 3 1011 B (11)
0100 4 1100 C (12)
0101 5 1101 D (13)
0110 6 1110 E (14)
0111 7 1111 F (15)

In the second table, the decimal numbers from 10 to 15 have been replaced with the first 6 letters of the alphabet. These are the extra digits (compared to the decimal system) of the hexadecimal system.

The hexadecimal system offers an alternative way of representing binary numbers packed into nibbles, but also into bytes (2 hexadecimal digits), into word (4 hexadecimal digits) and so on. For example, take the binary number 00101011 and convert it to hexadecimal: the first 4 bits correspond to the digit 2 and the other 4 bits correspond to B. So the binary number 0010 1011 = 2B in hexadecimal format (2Bh)

Let us now convert this number to decimal: 2 x 161 + B x 160 = 32 + 11 = 43. 

For the conversion was taken into account that the letters A to F are equivalent to the decimal numbers 10 through 15.

The reverse conversion can be done with the method of division:

43 / 16 = 2 r=11 = 0Bh
02 / 16 = 0 r=02 = 02h

Now reading the remainder of each division from the bottom upwards we get the number 2Bh.

The hexadecimal system is not only to ensure a correspondence between digits and bits. You can take other number systems. One of these could be the octal system (base 8). The only drawback is caused by the fact that each octal digit corresponds to 3 bits: this contradicts the criterion used for packing the binary digits in sizes seen before, generally adopted from modern computers.


Two's complement and negative numbers

It has been previously mentioned the possibility of representing, in binary, any number. To do this, you might think of adopting, as in the decimal system, additional symbols such as comma or a minus sign. However, we already know that computers, internally, are able to handle only two symbols: 0 and 1. Any other symbol is formed by a combination of these two.

At this point you have to think of a trick that allows us to depict the relative numbers, using only the binary digits. The trick is to adopt a method known as 2's complement binary numbers. To complement a binary number you have to reverse all the bits of the same (Ref. 1)
In two's complement over the normal reversal of the bits is also added 1 to the number complemented.

Examples:

  • one's complement of 00101011b (2Bh) is 11010100b (D4h)
  • two's complement of 2Bh, is:
00101011 (2Bh)
11010100+(one's complement of 2Bh)
00000001=
---------
11010101 (two's complement of 2Bh)

Then the two's complement of 2Bh (43d) can be used to represent the negative number -2Bh (-43d).

To prove the claim, you can try to add together the two numbers. The result of the sum will then be 0.

Let's try, therefore, to add 2Bh and its two's complement:  

. 00101011+ (2Bh)
. 11010101= (two's complement of 2Bh)
. ---------
1 00000000

If we do not consider the most significant bit of the result, the remaining 8 bits are all zero.

And in fact we are "forced" to overlook the MSB of the result, because there is no place for it in our byte.
That bit ignored is called carry bit. If no account is taken of the carry bit, the result of the sum is zero.
In this way -2Bh can be obtained by two's complementing 2Bh, as well as any other negative number can be obtained by two's complementing the positive number equivalent ingnoring sign.
We observe that the MSB of each byte is used to represent negative numbers is always 1.

Let's take another example. Let's try to make the difference between 34h (00110100b) and 24h (00100100b). We expect that result is 10h (00010000b).

As seen, the difference between 34h and 24h is similar to making the sum of 34h and 2's complement of 24h:


0010 0100 (24h)
1101 1011+(one's complement of 24= DBh)
0000 0001=
---------
1101 1100 (2's complement of 24h = DCh)

Adding 24h to DCh we obtain 10h

. 1101 1100+ (DCh)
. 0011 0100= (34h)
-----------
1 0001 0000 (10h

Also in this case we do not take into account the carry bit becouse out of byte format. 

The representation of negative numbers can be obtained in other ways. The advantage of adopting the 2's complement method is that the difference between two numbers is automatically derived from the operation of sum. However, the 2's complement can be exploited only for integers. In the case of real numbers, the mechanism is bit more complex. In that case the representation is not automatic but uses a set of rules that allow you to manipulate the real numbers, more properly called floating point.


Floating point numbers

Floating-point is also known as scientific notation. For example, the number -1,234,000 can be written in scientific notation as -1.234x106 . The number can be represented as decomposed into three parts: sign (-), mantissa (1.234) and exponent 6. Of course the number -1.234.000 can be rewritten, again in scientific notation in countless other ways, but it is convenient to choose one of the possible representations. For example, -0.1234 x107 exploits the peculiarities of how to represent the mantissa digits which are the ones after the decimal, being the first digit is always implicitly zero. However, when we choose a representation that does the job, we will call it normalized representation, and likewise we will call other possible representations "denormalized". 
The normalization serves to assign a precise dimension to the three fields of the floating point number, so that this can be represented by appropriate format.

It is obvious, that the representation of the mantissa and exponent fields will be done in binary, while it is not obvious, but it will be later on, that the exponent should indicate a power of 2.

Figure 1 shows the format of floating point numbers handled by the Intel math unit (IEEE 754). 

Each of the formats, see Figure 1, is divided into three fields already seen, namely: sign, mantissa and exponent. The order in which the fields are represented is closely related to the rule having established that the most significant field should be placed in position more "privileged" compared to the less significant. 

The sign bit is definitely the most important field, then follows the exponent and finally the mantissa. In fact, while a variation of a bit of the mantissa change the number but not its order of magnitude, the change of a bit of the exponent changes the order of magnitude of the number itself, and even the change in the sign bit upsets his relative value.

Now, let us analyze in detail the individual fields of a floating-point number.

Sign

Sign field is constituted by a single bit, which takes value 0 to indicate that the floating point number is positive, and 1 to indicate that the number is negative.

If the floating point number is zero, and this occurs when the fields mantissa and exponent bits are all set to 0, the field can take either negative or positive value. Although there is a positive zero and a negative zero (+0 and -0) the two numbers represent the same quantity, namely zero.

Mantissa

Mantissa of the floating point numbers is of the form X1, X2X3X4...Xn, where Xi (i = 1,2,3 , .. , n) is a binary digit. For the best accuracy, ie, maximizing the number of significant bits of the mantissa, it is necessary to normalize the floating point number. So by choosing to put significant digits after the decimal point, it is necessary adjust the exponent to have X1 = 0, e Xi ¹ 0 for i> 1 , thereby avoiding wasting precious bits filling the mantissa of zeros at the beginning of the same number. Thanks to the representation in binary and the adoption of the power of 2, it is always possible acting on exponent, to put the first bit of the mantissa X1=1 instead of zero, obtaining a higher precision with constant number of bits. For example, the binary number 0.0001101111 ... , decreasing the exponent of 4, can be represented as 1.101111 ... rather than as 0.1101111... thus gaining an invaluable bit in the mantissa.
Normalizing in this way , each floating point number, it can omit the first bit of the mantissa which is implicitly assumed equal to 1, so that the mantissa can be rewritten in the form 
1,X2X3X4...Xn,, where Xi only occupy a place in the mantissa field.

However, there are exceptions to this rule, and one in particular is the representation of 0, that is made, as already said, setting to zero the bits of the mantissa field and those of the exponent field.

Exponent

Exponent field specifies the power of 2 that multiplied the mantissa determines the value in the module of floating-point number. Some computers use powers of 16 instead of 2 (such as the IBM 360/370). The advantage of using powers of 2 resides mainly in the fact that with the same hardware complexity, computers that take the powers of two are faster than those which adopt a power of 16. Moreover, using powers of two, you can use the "trick" to put 1 to the MSB of the mantissa rather than to zero.

To make possible negative exponents, the exponent field contains the sum of the actual exponent and a positive constant called bias, which is taken as neutral exponent (0).

The bias applies 127 for single precision, 1023 for double and 16383 for extended precision. 

For example, if the exponent of a number in single precision was 5, the value stored in the exponent field would be 127+5=132. If the exponent was -5 then the exponent field contains 127-5=122.

The following table shows the internal representation of different indications formats of floating point numbers

Single precision 32 bitss s (sign bit) 8 bits
EXPONENT
23 bits - MANTISSA
Double precision 64 bits s (sign bit) 11 bits - EXPONENT 52 bits - MANTISSA
Extended precision 80 bits s ((sign bit) 23 bit - EXPONENT 64 bits - MANTISSA

Conversion formulas

The conversion of a floating point number, in the format discussed above can be carried out taking into account the rules established for the representation of various fields. Wanting to formalize them, we could use the following expressions for the conversion of numbers to decimal representation:
    (-1)sign (1,X1X2...X23) 2(exponent-127) for single precision
    (-1)sign (1,X1X2...X52) 2(exponent-1023) for double precision
    (-1)sign (1,X1X2...X64) 2(exponent-16383) for extended precision

    Sample.

    Let us convert the number in single precision represented as
    1 01111110 11000000000000000000000

    where sign=1, exponent=126, X1 =1 e X2=1 (while not affecting the X index>2). Therefore, using the first formula we have  

    (-1)1 (1.11b) 2(126-127) = -1 (1.75d) 2-1 = -0.875 

    It should be noted that the binary digits that are after the decimal point have weight 2-n. Where n is the bit position counted from left to right. So the first two bits of the mantissa correspond to the decimal number 0.75 obtained as 

    2-1 + 2-2 =1/2+1/4=0.75. 

    Then adding the MSB (which does not appear in number, but we know it is 1) you get 1.75 base 10 equivalent to the number 1.11 in base 2.


    BCD format

    Due to the fact computers think in binary, all arithmetic operations discussed so far were related to binary numbers. But our world is decimal.
    The fact that computers work with only two voltage levels does not mean that they should represent their numbers in binary notation. Certainly, the sequence of bits 0 and 1 could be used to encode separately each decimal digit of a number.
    For example, instead of representing the decimal number 43 with equivalent binary 00101011, it may be represented by the decimal encoding of 4 (0100b), followed by the decimal encoding of 3 (0011b), obtaining the representation 0011b 0100. Note that this is a binary encoding of decimal digits, and is aptly named Binary-Coded Decimal or BCD. The following table shows the coding of each decimal digit.


    0000
    0
    1000
    8
    0001
    1
    1001
    9
    0010
    2
    1010
    *
    0011
    3
    1011
    *
    0100
    4
    1100
    *
    0101
    5
    1101
    *
    0110
    6
    1110
    *
    0111
    7
    1111
    *
    * Invalid bit combination

    The reason why computers think in binary notation rather than decimal, is that the binary notation is more compact. For example, the number 125 can be represented with 8 bits in binary notation (01111101) but requires 12-bit BCD (0001 0010 0101). However, it often happens, having to use of the BCD code for the representation of decimal numbers so that they can be easily manipulated and managed. 
    This implies that the majority of computers are equipped with an instruction set which enables it to operate on BCD numbers, as well as on binary numbers; or otherwise, have a set of instructions that allows him to make a quick conversion from one to another encoding.


    Encoding information

    Each processor uses different internal components and different encodings for the information on which works, so it is important that when multiple devices are connected, the communication between them occurs without ambiguity.

    It is necessary to assign to each value that data can take a proper configuration of bits in accordance with established rules. These rules are the codes.

    In general it can be said that for the transmission of information to the outside world the logical devices may use different codes compared to those used for the processing of data internally. There are in fact, peripherals such as printers that can be connected indifferently to computers completely different.
    The encoding of information guarantees the possibility that heterogeneous computers can communicate with each other, and this could not happen if they do not all speak the "same language" or better the same ... code.

    The criteria with which the bits are assigned to generate the codes are different: for example, the BCD coding of a number ten decimal digits correspond to the combination of 4 bits. 

    This encoding procured the hassle of having to discard as many as 6 combinations of 4 bits, becouse they exceeded the number of decimal places required. 

    It should be recalled, in this regard, that the binary combinations needed to encode M symbols are 2n such that 2n-1 < M < 2n so if M=10 you have 2< 10 < 24, then it takes at least 4 bits to represent 10 digits, and this means that 16 (24) possible combinations only 10 are used and the remaining 6 combinations are discarded as invalid words.


    ASCII code

     
    For communication between the microprocessor and peripherals such as keyboards, printers, etc., is necessary to transmit not only decimal digits but also more complex characters, that can be classified as:
    • uppercase and lowercase letters;
    • decimal digits;
    • punctuation characters (";,:.!? etc ...);
    • arithmetic symbols (+ * - / etc ...);
    • graphic symbols;
    • control and special characters
    In all, over a hundred of symbols. To represent this in a code are required at least 7 bits (27=128).

    ASCII (American Standard Code for Information Interchange) is certainly the most used. This code, in its version universally recognized is composed of 128 symbols. However, many computer manufacturers have also expanded, each to their own, the code using 8 bits instead of 7, then adding another 128 special characters, but still leaving unchanged with the first 128 characters of the original code.

    For a complete description please refer to the following link http://en.wikipedia.org/wiki/ASCII

    Comments