| ??? 01/29/04 12:34 Read: times |
#63611 - RE: String Compression Algorthim... Responding to: ???'s previous message |
Huffman coding may not provide good enough compression for small data sets. The reason for this is because after compression you still need to store the decoding tree. In addition since Huffman encoded data sets use variable length bit codes Azhar Chohan would also have to devise a bit index table to figure out the image bit number offsets to the beginning of each element. This index table is what destroies any advantage that Huffman encoding may have acheive.
A small file of ASCII characters (lets guess 6K bytes) would compress typically down to about 3K. Then you add a typical decoding tree say 1K bytes to that and you get to an image size of maybe 4K. Now since the bit image is 3K bytes it takes nearly 12 bits to address any byte in the image (we can skip the size of the decoding tree becasue its not part of the data we need to index into). It takes an additional 3 bits to offset to the bit where any given string starts in the table. So for all intents and purposes the index table will take two byte offset pointers. If you have 1K strings in the data set then this indexing table is 2K bytes. Adding that all up....3K compressed table + 1K decoding tree + 2K index table ....results in an total image size of 6K bytes. Surprisingly this is the same size as the original data. The only way this Huffman coding scheme could ever be close to useful is to have larger original data sets with longer but fewer number of strings. Almost any compressed image that requires random access to the elements in the compressed image is going to suffer greatly unless you have enough RAM to decode the data into. Michael Karas |



