Data compression

From Citizendium
Revision as of 15:41, 10 May 2007 by imported>Alexander Wiebel (o --> r for color red)
Jump to navigation Jump to search

Data compression is the modification of (digital) data such that it can be represented by less characters than in its original form. In computer science data compression is mainly used to reduce the memory needed to store certain information. There are two types of compression, lossless and lossy compression. While the complete information can be retrieved from data compressed using a lossless technique, some information is lost if data was compressed using a lossy technique.

Example

Suppose the following simple coding scheme for images:

y = yellow, b = black, r = red

Compression example.png

With this coding scheme the above image can be encoded by the following string:

yyyybyyyyyyyybyyyybbbbbbbbbrrrrbrrrrrrrrbrrrr

Each pixel in the image is represented by the character corresponding to the color of the pixel. The order of the pixels is assumed to be from upper left to lower right.

This coding scheme can be modified as follows to achieve a compression of the data: Each character representing a color is mention only once and is followed by a digit. The digit represents the number of consecutive appearances of a single color. With this new coding scheme the image can be represented by the following string:

y4b1y8b1y4b9r4b1r8b1r4

Using the first scheme the image is represented by 45 characters. The second scheme uses only 22 characters to encode the image. Thus the compression scheme achieves a rate of 50%.

Obviously this simple compression scheme is only effective for images that have large connected areas with the same color. If the color changes from pixel to pixel this scheme does not compress the data, but indeed increases the needed characters by a factor 2.