Data Compression

This page contains a bunch of objects that implement various parts of compression algorithms. They can be put together in different ways to construct many different algorithms. Note that the compress_stream object contains complete compression algorithms. So if you just want to compress some data then you can easily use that object and not bother with the others.

In the column to the right you can see benchmark data for each of the compress_stream typedefs. The times measured are the time it takes to compress and then decompress each file. It was run on a 3.0ghz P4. For reference see the Canterbury corpus web site.

[top]

compress_stream



This object is pretty straight forward. It has no state and just contains the functions compress and decompress. They do just what their names imply to iostream objects.

C++ Example Programs: compress_stream_ex.cpp, file_to_code_ex.cpp
More Details...
#include <dlib/compress_stream.h>


Implementations:
compress_stream_kernel_1:
This implementation is done using the entropy_encoder_model and entropy_decoder_model objects.
kernel_1a
is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_1b and entropy_decoder_model_kernel_1b
kernel_1b
is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_2b and entropy_decoder_model_kernel_2b
kernel_1c
is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_3b and entropy_decoder_model_kernel_3b
kernel_1da
is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_4a and entropy_decoder_model_kernel_4a
kernel_1db
is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_4b and entropy_decoder_model_kernel_4b
kernel_1ea
is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_5a and entropy_decoder_model_kernel_5a
kernel_1eb
is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_5b and entropy_decoder_model_kernel_5b
kernel_1ec
is a typedef for compress_stream_kernel_1 which uses entropy_decoder_model_kernel_5c and entropy_decoder_model_kernel_5c
compress_stream_kernel_2:
This implementation is done using the entropy_encoder_model and entropy_decoder_model objects. It also uses the lz77_buffer object. It uses the entropy coder models to encode symbols when there is no match found by the lz77_buffer.
kernel_2a
is a typedef for compress_stream_kernel_2 which uses entropy_encoder_model_kernel_2b, entropy_decoder_model_kernel_2b, and lz77_buffer_kernel_2a.
compress_stream_kernel_3:
This implementation is done using the lzp_buffer object and crc32 object. It does not use any sort of entropy coding, instead a byte aligned output method is used.
kernel_3a
is a typedef for compress_stream_kernel_3 which uses lzp_buffer_kernel_1.
kernel_3b
is a typedef for compress_stream_kernel_3 which uses lzp_buffer_kernel_2.
[top]

conditioning_class



This object represents a conditioning class used for arithmetic style compression. It maintains the cumulative counts which are needed by the entropy_encoder and entropy_decoder objects below.
More Details...
#include <dlib/conditioning_class.h>


Implementations:
conditioning_class_kernel_1:
This implementation is done using an array to store all the counts and they are summed whenever the cumulative counts are requested. It's pretty straight forward.
kernel_1a
is a typedef for conditioning_class_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
conditioning_class_kernel_2:
This implementation is done using a binary tree where each node in the tree represents one symbol and contains that symbols count and the sum of all the counts for the nodes to the left. This way when you request a cumulative count it can be computed by visiting log n nodes where n is the size of the alphabet.
kernel_2a
is a typedef for conditioning_class_kernel_2
kernel_2a_c
is a typedef for kernel_2a that checks its preconditions.
conditioning_class_kernel_3:
This implementation is done using an array to store all the counts and they are summed whenever the cumulative counts are requested. The counts are also kept in semi-sorted order to speed up the calculation of the cumulative count.
kernel_3a
is a typedef for conditioning_class_kernel_3
kernel_3a_c
is a typedef for kernel_3a that checks its preconditions.
conditioning_class_kernel_4:
This implementation is done using a linked list to store all the counts and they are summed whenever the cumulative counts are requested. The counts are also kept in semi-sorted order to speed up the calculation of the cumulative count. This implementation also uses the memory_manager component to create a memory pool of linked list nodes. This implementation is especially useful for high order contexts and/or very large and sparse alphabets.
kernel_4a
is a typedef for conditioning_class_kernel_4 with a memory pool of 10,000 nodes.
kernel_4a_c
is a typedef for kernel_4a that checks its preconditions.
kernel_4b
is a typedef for conditioning_class_kernel_4 with a memory pool of 100,000 nodes.
kernel_4b_c
is a typedef for kernel_4b that checks its preconditions.
kernel_4c
is a typedef for conditioning_class_kernel_4 with a memory pool of 1,000,000 nodes.
kernel_4c_c
is a typedef for kernel_4c that checks its preconditions.
kernel_4d
is a typedef for conditioning_class_kernel_4 with a memory pool of 10,000,000 nodes.
kernel_4d_c
is a typedef for kernel_4d that checks its preconditions.
[top]

entropy_decoder



This object represents an entropy decoder. E.g. the decoding part of an arithmetic coder.
More Details...
#include <dlib/entropy_decoder.h>


Implementations:
entropy_decoder_kernel_1:
This object is implemented using arithmetic coding and is done in the straight forward way using integers and fixed precision math.
kernel_1a
is a typedef for entropy_decoder_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
entropy_decoder_kernel_2:
This object is implemented using "range" coding and is done in the straight forward way using integers and fixed precision math.
kernel_2a
is a typedef for entropy_decoder_kernel_2
kernel_2a_c
is a typedef for kernel_2a that checks its preconditions.
[top]

entropy_decoder_model



This object represents some kind of statistical model. You can use it to read symbols from an entropy_decoder and it will calculate the cumulative counts/probabilities and manage contexts for you.
More Details...
#include <dlib/entropy_decoder_model.h>


Implementations:
entropy_decoder_model_kernel_1:
This object is implemented using the conditioning_class component. It implements an order-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D.
kernel_1a
is a typedef for entropy_decoder_model_kernel_1 that uses conditioning_class_kernel_1a
kernel_1b
is a typedef for entropy_decoder_model_kernel_1 that uses conditioning_class_kernel_2a
kernel_1c
is a typedef for entropy_decoder_model_kernel_1 that uses conditioning_class_kernel_3a
entropy_decoder_model_kernel_2:
This object is implemented using the conditioning_class component. It implements an order-1-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D.
kernel_2a
is a typedef for entropy_decoder_model_kernel_2 that uses conditioning_class_kernel_1a
kernel_2b
is a typedef for entropy_decoder_model_kernel_2 that uses conditioning_class_kernel_2a
kernel_2c
is a typedef for entropy_decoder_model_kernel_2 that uses conditioning_class_kernel_3a
kernel_2d
is a typedef for entropy_decoder_model_kernel_2 that uses conditioning_class_kernel_2a for its order-0 context and conditioning_class_kernel_4b for its order-1 context.
entropy_decoder_model_kernel_3:
This object is implemented using the conditioning_class component. It implements an order-2-1-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D.
kernel_3a
is a typedef for entropy_decoder_model_kernel_3 that uses conditioning_class_kernel_1a for orders 0 and 1 and conditioning_class_kernel_4b for order-2.
kernel_3b
is a typedef for entropy_decoder_model_kernel_3 that uses conditioning_class_kernel_2a for orders 0 and 1 and conditioning_class_kernel_4b for order-2.
kernel_3c
is a typedef for entropy_decoder_model_kernel_3 that uses conditioning_class_kernel_3a for orders 0 and 1 and conditioning_class_kernel_4b for order-2.
entropy_decoder_model_kernel_4:
This object is implemented using a variation of the PPM algorithm described by Alistair Moffat in his paper "Implementing the PPM data compression scheme." It provides template arguments to select the maximum order and maximum memory to use. For speed, exclusions are not used. The escape method used is method D.
kernel_4a
is a typedef for entropy_decoder_model_kernel_4 with the max order set to 4 and the max number of nodes set to 200,000
kernel_4b
is a typedef for entropy_decoder_model_kernel_4 with the max order set to 5 and the max number of nodes set to 1,000,000
entropy_decoder_model_kernel_5:
This object is implemented using a variation of the PPM algorithm described by Alistair Moffat in his paper "Implementing the PPM data compression scheme." It provides template arguments to select the maximum order and maximum memory to use. Exclusions are used. The escape method used is method D. This implementation is very much like kernel_4 except it is tuned for higher compression rather than speed. This also uses Dmitry Shkarin's Information Inheritance scheme.
kernel_5a
is a typedef for entropy_decoder_model_kernel_5 with the max order set to 4 and the max number of nodes set to 200,000
kernel_5b
is a typedef for entropy_decoder_model_kernel_5 with the max order set to 5 and the max number of nodes set to 1,000,000
kernel_5c
is a typedef for entropy_decoder_model_kernel_5 with the max order set to 7 and the max number of nodes set to 2,500,000
entropy_decoder_model_kernel_6:
This object just assigns every symbol the same probability. I.e. it uses an order-(-1) model.
kernel_6a
is a typedef for entropy_decoder_model_kernel_6
[top]

entropy_encoder



This object represents an entropy encoder. E.g. the encoding part of an arithmetic coder.
More Details...
#include <dlib/entropy_encoder.h>


Implementations:
entropy_encoder_kernel_1:
This object is implemented using arithmetic coding and is done in the straight forward way using integers and fixed precision math.
kernel_1a
is a typedef for entropy_encoder_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
entropy_encoder_kernel_2:
This object is implemented using "range" coding and is done in the straight forward way using integers and fixed precision math.
kernel_2a
is a typedef for entropy_encoder_kernel_2
kernel_2a_c
is a typedef for kernel_2a that checks its preconditions.
[top]

entropy_encoder_model



This object represents some kind of statistical model. You can use it to write symbols to an entropy_encoder and it will calculate the cumulative counts/probabilities and manage contexts for you.
More Details...
#include <dlib/entropy_encoder_model.h>


Implementations:
entropy_encoder_model_kernel_1:
This object is implemented using the conditioning_class component. It implements an order-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D.
kernel_1a
is a typedef for entropy_encoder_model_kernel_1 that uses conditioning_class_kernel_1a
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
kernel_1b
is a typedef for entropy_encoder_model_kernel_1 that uses conditioning_class_kernel_2a
kernel_1b_c
is a typedef for kernel_1b that checks its preconditions.
kernel_1c
is a typedef for entropy_encoder_model_kernel_1 that uses conditioning_class_kernel_3a
kernel_1c_c
is a typedef for kernel_1c that checks its preconditions.
entropy_encoder_model_kernel_2:
This object is implemented using the conditioning_class component. It implements an order-1-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D.
kernel_2a
is a typedef for entropy_encoder_model_kernel_2 that uses conditioning_class_kernel_1a
kernel_2a_c
is a typedef for kernel_2a that checks its preconditions.
kernel_2b
is a typedef for entropy_encoder_model_kernel_2 that uses conditioning_class_kernel_2a
kernel_2b_c
is a typedef for kernel_2b that checks its preconditions.
kernel_2c
is a typedef for entropy_encoder_model_kernel_2 that uses conditioning_class_kernel_3a
kernel_2c_c
is a typedef for kernel_2c that checks its preconditions.
kernel_2d
is a typedef for entropy_encoder_model_kernel_2 that uses conditioning_class_kernel_2a for its order-0 context and conditioning_class_kernel_4b for its order-1 context.
kernel_2d_c
is a typedef for kernel_2d that checks its preconditions.
entropy_encoder_model_kernel_3:
This object is implemented using the conditioning_class component. It implements an order-2-1-0 finite context model and uses lazy exclusions and update exclusions. The escape method used is method D.
kernel_3a
is a typedef for entropy_encoder_model_kernel_3 that uses conditioning_class_kernel_1a for orders 0 and 1 and conditioning_class_kernel_4b for order-2.
kernel_3a_c
is a typedef for kernel_3a that checks its preconditions.
kernel_3b
is a typedef for entropy_encoder_model_kernel_3 that uses conditioning_class_kernel_2a for orders 0 and 1 and conditioning_class_kernel_4b for order-2.
kernel_3b_c
is a typedef for kernel_3b that checks its preconditions.
kernel_3c
is a typedef for entropy_encoder_model_kernel_3 that uses conditioning_class_kernel_3a for orders 0 and 1 and conditioning_class_kernel_4b for order-2.
kernel_3c_c
is a typedef for kernel_3c that checks its preconditions.
entropy_encoder_model_kernel_4:
This object is implemented using a variation of the PPM algorithm described by Alistair Moffat in his paper "Implementing the PPM data compression scheme." It provides template arguments to select the maximum order and maximum memory to use. For speed, exclusions are not used. The escape method used is method D.
kernel_4a
is a typedef for entropy_encoder_model_kernel_4 with the max order set to 4 and the max number of nodes set to 200,000
kernel_4a_c
is a typedef for kernel_4a that checks its preconditions.
kernel_4b
is a typedef for entropy_encoder_model_kernel_4 with the max order set to 5 and the max number of nodes set to 1,000,000
kernel_4b_c
is a typedef for kernel_4b that checks its preconditions.
entropy_encoder_model_kernel_5:
This object is implemented using a variation of the PPM algorithm described by Alistair Moffat in his paper "Implementing the PPM data compression scheme." It provides template arguments to select the maximum order and maximum memory to use. Exclusions are used. The escape method used is method D. This implementation is very much like kernel_4 except it is tuned for higher compression rather than speed. This also uses Dmitry Shkarin's Information Inheritance scheme.
kernel_5a
is a typedef for entropy_encoder_model_kernel_5 with the max order set to 4 and the max number of nodes set to 200,000
kernel_5a_c
is a typedef for kernel_5a that checks its preconditions.
kernel_5b
is a typedef for entropy_encoder_model_kernel_5 with the max order set to 5 and the max number of nodes set to 1,000,000
kernel_5b_c
is a typedef for kernel_5b that checks its preconditions.
kernel_5c
is a typedef for entropy_encoder_model_kernel_5 with the max order set to 7 and the max number of nodes set to 2,500,000
kernel_5c_c
is a typedef for kernel_5c that checks its preconditions.
entropy_encoder_model_kernel_6:
This object just assigns every symbol the same probability. I.e. it uses an order-(-1) model.
kernel_6a
is a typedef for entropy_encoder_model_kernel_6
kernel_6a_c
is a typedef for kernel_6a that checks its preconditions.
[top]

lz77_buffer



This object represents a pair of buffers (history and lookahead buffers) used during lz77 style compression.
More Details...
#include <dlib/lz77_buffer.h>


Implementations:
lz77_buffer_kernel_1:
This object is implemented using the sliding_buffer and it just does simple linear searches of the history buffer to find matches.
kernel_1a
is a typedef for lz77_buffer_kernel_1 that uses sliding_buffer_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
lz77_buffer_kernel_2:
This object is implemented using the sliding_buffer. It finds matches by using a hash table.
kernel_2a
is a typedef for lz77_buffer_kernel_2 that uses sliding_buffer_kernel_1
kernel_2a_c
is a typedef for kernel_2a that checks its preconditions.
[top]

lzp_buffer



This object represents some variation on the LZP algorithm described by Charles Bloom in his paper "LZP: a new data compression algorithm"
More Details...
#include <dlib/lzp_buffer.h>


Implementations:
lzp_buffer_kernel_1:
This object is implemented using the sliding_buffer and uses an order-3 model to predict matches.
kernel_1a
is a typedef for lzp_buffer_kernel_1 that uses sliding_buffer_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
lzp_buffer_kernel_2:
This object is implemented using the sliding_buffer and uses an order-5-4-3 model to predict matches.
kernel_2a
is a typedef for lzp_buffer_kernel_2 that uses sliding_buffer_kernel_1
kernel_2a_c
is a typedef for kernel_2a that checks its preconditions.