Computer Science – 1.3 Compression | e-Consult
1.3 Compression (1 questions)
Login to see all questions.
Click on a question to view the answer
Huffman coding is a variable-length prefix coding algorithm used for lossless data compression. It assigns shorter codes to more frequent symbols and longer codes to less frequent symbols. This results in an overall reduction in the average number of bits required to represent the data.
Example:
- Calculate the frequency of each character: A: 0.4, B: 0.2, C: 0.3, D: 0.1
- Create a priority queue (min-heap) with the characters and their frequencies.
- Repeatedly combine the two nodes with the lowest frequencies into a new node with a frequency equal to the sum of their frequencies. This process continues until only one node remains, which represents the root of the Huffman tree.
- Assign codes to each character based on the path taken from the root to the character in the Huffman tree. Left branches are assigned '0' and right branches are assigned '1'.
Huffman Codes:
| Character | Frequency | Huffman Code |
| A | 0.4 | 0 |
| B | 0.2 | 10 |
| C | 0.3 | 11 |
| D | 0.1 | 100 |
Average Code Length: (0.4 * 1) + (0.2 * 2) + (0.3 * 2) + (0.1 * 3) = 0.4 + 0.4 + 0.6 + 0.3 = 1.7 bits per character. The uncompressed data would require 8 bits per character. Therefore, the average compression ratio is approximately 8/1.7 = 4.76:1. This demonstrates how Huffman coding can achieve significant compression, especially when there are significant differences in character frequencies.