|
Posted by Oli Filth on 09/15/05 20:04
Ron said the following on 15/09/2005 17:15:
> I have different words that consist of the characters 0-9 and A-Z. I would
> like to compress the string by encoding it using the base: 0-9, A-Z and a-z.
> That should reduce a word with about 40%.
>
Nope, not 40%.
With direct symbol-sequence to symbol-sequence translation (i.e. no
statistical compression occurring), the most you could theoretically get is
1 - log(26+10)/log(26+26+10) = 0.13
i.e. 13%, and this assumes you can find a suitable mapping.
Anything beyond that (i.e. actual compression) is entirely dependent on
the statistics of the source string. That's what "dictionary
compression" is all about.
The LZF *might* do what you want, but I've never used it and it has next
to zero documentation...
--
Oli
Navigation:
[Reply to this message]
|