Real Numbers Representation | Computer Architecture

Real Numbers – Real Numbers are numbers that include fractional values after the decimal point. There are two types of representation of real numbers.

Fixed point representation
Floating point representation

Table of Contents

Fixed Point Representation

This representation has a fixed number of bits for the integer part and for the fractional part. For example, if the given fixed-point representation is IIII.FFFF, then you can store the minimum value as 000.0001, and the maximum value is 9999.9999. There are three fields of a fixed-point number representation: the sign field, integer field, and fractional field.

we can represent these numbers using:

Signed representation: range from -(2^(k-1)-1) to (2^(k-1)-1), for k bits.
1’s complement representation: range from -(2^(k-1)-1) to (2^(k-1)-1), for k bits.
2’s complement representation: range from -(2^(k-1)-1) to (2^(k-1)-1), for k bits.

2’s complement representation is preferred in computer systems because of its unambiguous property and ease of arithmetic operations.

Example: Let’s assume a number is using a 32-bit format, which reserves 1 bit for the sign, 15 bits for the integer part, and 16 bits for the fractional part.

Then, -43.625 is represented as follows:

Sign bit	Integer part	Fractional part
1	000000000101011	1010000000000000

Where 0 is used to represent (+), and 1 is used to represent (-). 000000000101011 is 15 bit binary value for decimal 43, and 1010000000000000 is 16 bit binary value for fractional 0.625.

The advantage of using a fixed-point representation is performance, and the disadvantage is a relatively limited range of values that they can represent. So, it is usually inadequate for numerical analysis as it does not allow enough numbers and accuracy. A number whose representation exceeds 32 bits would have to be stored inexact.

Smallest

Sign bit	Integer part	Fractional part
1	000000000000000	0000000000000001

Largest

Sign bit	Integer part	Fractional part
0	111111111111111	1111111111111111

These are the smallest positive number and the largest positive number which can be stored in a 32-bit representation as given above format. Therefore, the smallest positive number is 2^-16 = 0.000015 approximate and the largest positive number is (2¹⁵-1) + (1-2^-16) = 2¹⁵(1-2^-16) = 32768, and the gap between these numbers is 2^-16. We can move the radix point either left or right with the help of only the integer field, which is 1.

Floating Point Representation

This representation does not reserve a specific number of bits for the integer part or the fractional part. Insted it reserves a certain number of bits for the number (called the mantissa or significand) and a certain number of bits to say where within that number the decimal place sits (called the exponent).

The floating number representation of a number has two parts: the first part represents a signed fixed-point number called the mantissa. The second part designates the position of the decimal (or binary) point and is called the exponent. The fixed point mantissa may be a fraction or an integer. Floating-point is always interpreted to represent a number in the following form: Mxr^e.

Only the mantissa m and the exponent e are physically represented in the register (including their sign). A floating-point binary number is represented in a similar manner except that it uses base 2 for the exponent. A floating-point number is said to be normalized if the most significant digit of the mantissa is 1.

Sign bit

Exponent

Mantissa

So, the actual number is (-1)^s(1+m)x2^(e-Bias)

Where,
s is the sign bit,
m is the mantissa,
e is the exponent value,
Bias is the bias number.

Note that signed integers and exponents are represented by either sign representation, one’s complement representation, or two’s complement representation. The floating point representation is more flexible. Any non-zero number can be represented in the normalized form of ±(1.b₁b₂b₃…)₂x2ⁿ. This is the normalized form of a number x.

Example: Suppose a number is using a 32-bit format: the 1-bit sign bit, 8 bits for the signed exponent, and 23 bits for the fractional part. The leading bit 1 is not stored (as it is always 1 for a normalized number) and is referred to as a “hidden bit”. Then -53.5 is normalized as -53.5=(-110101.1)₂=(-1.101011)x2⁵, which is represented as following below,

Sign bit	Exponent part	Mantissa part
1	00000101	1010110000000000000000

Where 00000101 is the 8-bit binary value of the exponent value +5.

Note that the 8-bit exponent field is used to store integer exponents -126 = n = 127.

The smallest normalized positive number that fits into 32 bits is (1.000000000000000 00000000)₂ x 2^-126 = 2^-126 ≈ 1.18 x 10^-38, and the largest normalized positive number that fits into 32 bits is (1.11111111111111111111111)₂ x 2¹²⁷ = (2²⁴-1) x 2¹⁰⁴ ≈ 3.40 x 10³⁸. These numbers are represented as below-

Smallest

Sign bit	Exponent part	Mantissa
1	10000010	0000000000000000000000

Largest

Sign bit	Exponent part	Mantissa
1	01111111	11111111111111111111111

Real Numbers Representation | Computer Architecture

Fixed Point Representation

Floating Point Representation

Pages

Our Tools

Engineering Subjects

Programming Tutorials

NCERT