Huffman Coding
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, with the lengths determined by the frequency of the corresponding characters. The character with the highest frequency of occurrence is assigned the smallest code, while the character with the lowest frequency of occurrence is assigned the largest code.
Prefix Codes are variable-length codes assigned to input characters, which means that the code assigned to one character is not the prefix of any other character’s code. When decoding the generated bitstream in this manner, Huffman Coding ensures that there is no ambiguity.
There are mainly two parts to Huffman Coding:
1. Use the input characters to build a Huffman tree.
2. The Huffman Tree is to be traversed and characters will be assigned their respective codes.
Steps to build Huffman Tree
The input is an array of unique characters with their frequency of occurrences, and the output is a Huffman Tree.
1. Make a leaf node for each character that is unique and build a min-heap of all the leaf nodes. (The priority queue is min-heap. The frequency Field Values are used to compare two nodes in min-heap. Initially, the character which is least frequent is at the root.)
2. Take the two nodes with the lowest frequency from the min-heap.
3. Now make a new internal node whose frequency is equal to the sum of the frequency of the nodes which were taken in step 2. The first extracted node will be the left child and the other one will be the right child. Finally, add this node to the min-heap.
4. Repeat the 2nd and 3rd steps until only one node is left. The remaining node will be the root node and the tree is complete.
we will now understand it further with the help of an example
1. build a min-heap that has 6 leaf nodes that each represent a root of a tree with a single node.
2. Now we will do steps 2 and 3 where we extract 2 minimum frequency nodes from the min-heap and add an internal node with their sum as the frequency. In this case, it is 9+5=14
Now the min-heap contains 5 nodes, 4 of which are roots with a single element each, and the remaining heap node is a root of the tree that has 3 elements.
3. Once again we will repeat steps 2 and 3. In this case, it is 12+13=25
Now the min-heap contains 4 nodes, 2 of which are roots with a single element each, and the remaining heap nodes are roots of the tree that has more than 1 node.
4. Once again we will repeat steps 2 and 3. In this case, it is 14+16=30
Now the min-heap has 3 nodes.
5. Once again we will repeat steps 2 and 3. In this case, it is 25+30=55
Now the min-heap has only 2 nodes.
6. We will be repeating steps 2 and 3 for the final time as after this only 1 node will remain and the tree will be complete. In this case, it is 45+55=100
Now there is only 1 node in the min-heap.
The algorithm will stop here as there is only one node left.
How to print code from the Huffman tree
Printing code from the Huffman tree is pretty easy. All we have to do is traverse the tree starting from the root. Maintain an auxiliary array and while moving to the left child write 0 on the array and similarly write 1 when moving to the right child. Finally, print the array whenever a leaf node is encountered.