| ??? 12/19/07 18:19 Read: times |
#148521 - Quantifying randomness Responding to: ???'s previous message |
Several years ago I developed a mathematical definition which quantifies randomness using fractal dimension theory. Calculate the information dimension of a finite set of data and the capacity dimension of the space the set occupies. The ratio of those two dimensions, alwaye less than or equal to unity (unity denoting pure or perfect randomness), can be defined as the mathematical randomness of the set.
It's important to distinguish between the randomness of a set of data, and the randomness of the process that generated it. A purely random process, like tossing a hundred coins into the air, could possibly generate a non-random result like one hundred heads-up coins. But if you walk into a room and find one hundred coins all lying heads up, you'll never consider that data set random regardless how it was produced. Similarly, picking lotto balls from the lotto machine could produce a sequence of 1, 2, 3, 4, 5, 6. But if you walked into the room and found the lotto balls 1, 2, 3, 4, 5, and 6 layed out in a row, you'd never consider that set random, again regardless of the process that produced it. Conversely, there is absolutely nothing random about an algorithm. It is wholly, completely deterministic. There is nothing random whatsoever about which number will be produced next. However, the set produced can be quite random. The problem is that most people still do not divorce the result from the process. Consequently, people are forced to adopt such handwaving "definitions" as pseudorandomness, which are not well defined at all. My definition, and susequent method of evaluation, divorces the result set from the process which produced it, allowing the randomness of the finite set to be evaluated and quantified. To evaluate the randomness of a set generated by a given algorithm, simply iterate the calculation for the generated set, adding the next random number generated each time. Plot the ratio on a graph. The slope of the line is the randomness of the data set, and as long as the slope of the graph remains linear, the algorithm is producing numbers of that randomness. This allows you to test a given algorithm, not only to see how random the numbers it generates are, but how long it will contiue to produce them as well. Some will produce random numbers for a while, but then will become much less random, or non-random, after a certain number. One famous example is RANDU of IBM fame. This test reveals the point at which this happens. Since it is derived from fractal dimension theory, my definition is a general case which applies to data sets in any R-dimension (R = real number) space, not just in I-dimension (I = integer) space. Interestingly though, in an appropriately restricted case it reduces to the Chi-squared calculation mentioned previously. Someday I really need to write this up and get it published. Joe |



