What is the theory of useful information? According to algorithmic information theory, the less compressible something is, the more information it has. But, the sort of information I'm interested in implies that there is a fairly large ratio of bits to compression. I.e. this text is much more compressible than a random string of letters, and conveys much more information to me than a random string. Is there a precise characterization of this sort of information? It isn't simple the compression ratio, since AAAAAAAAAAAAAAAAAAAAAAAAAAA is very compressible, but conveys very little information.
Go the other way. Assume two strings of length k: string a composed of english words, and string b generated by reading /dev/random. String a will be more compressible, length(compress(a)) < k. You can now add MORE "information" to a and compress it until length(compress(a)) = k. Since you can't compress b any further, it already contains the maximum amount of information.
I put "information" in quotes above because the actual "information per bit" of English is pretty low, and that's where the ability to both compress it and comprehend it, not as single bits but as groups of bits, comes from. Still, compressed data is still comprehensible once you uncompress it, so it's a measure of the surface comprehensiblity. It's really a measure of information density. Compressed data has a high information to space ratio, whereas random, uncompressible data has low information to space ratio (or perhaps negative).
You can experiment with this yourself:
cat > /tmp/string.a
(paste in some English text cut and pasted from a web page)
dd if=/dev/urandom of=/tmp/string.b bs=1 count=`wc -c < /tmp/string.a`
bzip2 /tmp/string.*
ls -l /tmp/string.*
Experiment with the above for corpora of different lengths. You'll see that as there is more English text, which is information rich, it can compress better than shorter English text (as a compression ratio), and will compress better than random data (which we know contains little information) of the same length.
A string composed solely of 27 As would compress down to perhaps 2 bytes or less (not including the size of the decompressor). You are right: there is not much information in it. Less than 16 bits of information in 27 As.
I'm looking for a definition of information that matches up with how we use the word in normal parlance. This definition has to strike some kind of medium between incompressible and completely compressible. I could make something up, but I was wondering what the official version is.
I'm not sure I follow. The definition of "information" I use everyday matches the "official" version -- I'm not sure how it could be different. What is "completely compressible"? Something can not be compressed beyond the shortest string that can represent it without losing information content.
Shannon's information is a measure of the size of a set that an element is selected from, is that right? So a letter conveys log2(26) bits of information. That alone doesn't allow me to discriminate between the information content of a random string and an English sentence, since both supposedly contain the same amount of information by this bare metric.
If I look at the occurrence of subsets of the string, then that would be a better discriminator: the random string's subsets should follow a normal distribution while the English string's subsets will be highly skewed.
However, that doesn't work when I try to discriminate between an English string and a string generated by a simple algorithm, since the latter's subset distribution will also be highly skewed. What kind of metric discriminates the English sentence from either case?
Am I making sense here? I haven't had any formal training in information theory, and my brain is kind of fried right now.