Run Length Encoding

One method of compression is known as Run-Length-Encoding. Run-Length-Encoding searches the file for repeating patterns in text and replaces them with a singular replacement digit which would represent that pattern. This process will result in files becoming smaller as, instead of using the original characters from the text, it would use the single replacement digit which would reduce the overall amount of characters used. The quote below is from the series ‘Little Britain’ and was shown to us by our Teacher. It is a good example of Run-Length-Encoding as it contains a lot of repetition. Some words in the quote have been colour-coded to highlight the patterns in the text. In its original state, this quote contains 32 words and 116 characters in all (including spaces). From seeing the patterns in the quote, we can start to compress the file in size. Let’s make it so that ‘Yeh’=1, ‘but’=2, ‘no’=3, ‘nuthin’ =4, ‘… (Ellipsis)’=5 and ‘on’=6. Now we can replace the original patterns with the new digits as you can see below.

We can now see that the original patterns have been made redundant and have been replaced by the singular digit that represents it. Some words remain in their original forms as there were no repeating patterns in them to replace. The quote has been compressed from 116 characters down to 72. This shows that, through RLE, this quote has seen reduced to 62% of its original size. This method is a way that computers can use to compress pictures as each colour would be represented by binary which would have the patterns contained within it. The compression program being used can then use these patterns to shorten the amount of data needed to represent the colour. The same principle would apply for notes in a music video.

When referring to compression… two terms are often used: Lossy compression and Lossless compression. ‘Lossy’ means that, when the file is reconstructed, there will be pieces missing which will result in poorer quality. Lossless means that, when the file is reconstructed, it will return to its exact original state. Run-Length-Encoding is a lossless way of compressing data as it can be returns to its exact previous state when the file is reconstructed.

This is an experiment we did in class on zipping files:
File A:
File B:
Another example of compression is when a file is zipped (compressed) using applications such as WinZip or WinRAR. With these methods, the more repeating patterns there are in a file or the more a pattern is repeated, the better the compression will be as it will mean that more sets of characters will be replaced by singular digits. We can see this because, as we can see, File A has compressed further than File B because of the constant repeating pattern of ‘Computer Science is great’. Because this File A has 1 pattern repeated over and over instead of jumbled text like File B, it allows it to be compressed further as more of the code can be represented by replacement digit. So when it comes to compression, it is important to know that, the more repetition, the better.

Sample Image (labelled for reuse)