Compression: Huffman Coding (AQA GCSE Computer Science)

Revision Note

Robert Hampton

Expertise

Computer Science Content Creator

Huffman Coding

What is huffman coding?

  • Huffman coding is a method of lossless compression primarily used on text based data (documents)
  • A huffman coding tree is used to compress the data whilst keeping all the data so that it can be uncompressed back to its original state

Example (step 1)

  • A text file contains the following characters
Text file
I   L O V E   C O M P U T E R S

  • The text file contain 16 characters including spaces
  • A frequency analysis shows there is some repetition in the characters, for example there are 2 x E's
Frequency
I space L O V E C M P U T R S
1 2 1 2 1 2 1 1 1 1 1 1 1

Huffman Trees

What is a huffman tree?

  • A huffman tree, also known a binary tree is used to lossless compress text based data as part of huffman coding
  • A huffman tree consists of nodes which can have either 0, 1 or 2 child nodes
  • Binary trees are covered in much more detail at A-Level 

Example (step 2)

  • The frequency analysis data is ordered from least to most frequent
I L V C M P U T R S space O E
1 1 1 1 1 1 1 1 1 1 2 2 2

  • The 2 least frequent characters (I, L) are joined together into a node as part of a binary tree as below

 huffman-coding-image1

  • Update the frequency data to show the combined characters
V C M P U T R S I L space O E
1 1 1 1 1 1 1 1 2 2 2 2

  • Repeat steps until all characters are combined

huffman-coding-image2

  • Next, follow each branch and place a 0 on left branches and a 1 on right branches, this creates a unique code for each character

huffman-coding-image3

  • This allows unique bit patterns for each character to be created and characters that are more frequent have a smaller bit pattern making it more efficient
Character Bit pattern Times used Total bits
space 000 2 3*2=6
O 001 2 3*2=6
E 010 2 3*2=6
V 1000 1 4*1=4
C 1001 1 4*1=4
M 1010 1 4*1=4
P 1011 1 4*1=4
U 1100 1 4*1=4
T 1101 1 4*1=4
R 1110 1 4*1=4
S 1111 1 4*1=4
    Total number of bits 50

  • In this example, using huffman coding we have compressed the original text files to 50 bits
  • Uncompressed, using ASCII, the text file would be 16 characters x 7 bits per character (16*7=112 bits) which means huffman coding has saved 62 bits of storage (112-50=62)

You've read 0 of your 0 free revision notes

Get unlimited access

to absolutely everything:

  • Downloadable PDFs
  • Unlimited Revision Notes
  • Topic Questions
  • Past Papers
  • Model Answers
  • Videos (Maths and Science)

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Robert Hampton

Author: Robert Hampton

Rob has over 16 years' experience teaching Computer Science and ICT at KS3 & GCSE levels. Rob has demonstrated strong leadership as Head of Department since 2012 and previously supported teacher development as a Specialist Leader of Education, empowering departments to excel in Computer Science. Beyond his tech expertise, Robert embraces the virtual world as an avid gamer, conquering digital battlefields when he's not coding.