Computer Science – 13.3 Floating-point numbers, representation and manipulation | e-Consult
13.3 Floating-point numbers, representation and manipulation (1 questions)
The binary representation of a real number is an approximation because the number of distinct real numbers that can be represented within a finite number of bits is limited. This is due to the discrete nature of the binary system and the continuous nature of real numbers. Essentially, we are trying to fit an infinite range of values into a finite set of values.
Consider a simple example: representing the decimal number 0.1 in binary. This number is infinitely many binary fractions (0.0001100110011...) and cannot be represented exactly with a finite number of bits. This leads to rounding errors.
The error introduced is known as quantization error. The more bits used to represent a real number, the smaller the quantization error can be, but it will never be eliminated entirely. The precision of the representation is directly related to the number of bits used. A larger number of bits allows for a finer granularity of representation, reducing the magnitude of the error.
Example with different bit lengths:
- 4 bits: Can represent 16 distinct values (0-15). 0.1 would be rounded to the nearest representable value, resulting in a significant error.
- 8 bits: Can represent 256 distinct values (0-255). The approximation is much better than with 4 bits, but still not exact.
- 16 bits: Can represent 65,536 distinct values (0-65535). The approximation is significantly improved, approaching the true value for most practical purposes.
The limitations are fundamental to digital computing. All real numbers are ultimately represented as discrete values, leading to inherent approximations.