The Challenge of Randomness (II)
Discovering the "good" randomness
Hello, Hypers!
This is the second part of a (probably) mini-series on how to generate randomness using computers. In the previous issue, we discussed the challenge of implementing probabilistic algorithms in classical computers and the most important of these algorithms: generating a uniform random number.
We finished the article with a Python implementation of a uniform random number generator. Here it is:
At this point you may have some questions:
Is this the best way to generate a uniform random number?
How to compare two random number generators?
How to measure how good a random number generator is?
In this article, we’ll try to answer these questions without talking about advanced statistics. So this is a good time to learn a couple of things if you are new to Probability and Statistics.
Let’s get started!
Evaluating a Random Generator
We humans are great at recognizing patterns. Evolution equipped us with this skill. Actually, we appreciate and consider beautiful things that contain patterns. For example, right at the heart of music, we have the concepts of rhythm and tempo. These concepts describe how the sounds form patterns that repeat at regular time intervals. These patterns are perceived by our ears and make us dance and enjoy the music.
Another example is science. What are natural sciences but the seek for patterns in our Universe? And social sciences look for patterns in human interactions. Natural and social laws are nothing but patterns that the human being has unveiled.
But when we talk about randomness we talk about the opposite of a pattern. Randomness is not just a challenge for classical computers but also for us. How to measure something that should be unpredictable and unordered? How to measure chaos? Maybe you have heard about entropy or statistical tests. But let’s try something way more intuitive and easy. We are going to see the chaos.
In the previous post, we said that a uniform random generator should generate numbers between 0 and 1 in a way that all of the numbers in that interval have the same probability of being generated. This is not the best definition but it will work for now. If you think about it a little bit, a true random uniform generator should produce the most unordered sequence of numbers between 0 and 1.
In this post, we’ll try to evaluate the quality of a random generator by visually detecting the best chaos. But, as we are good at detecting patterns instead of detecting chaos, we’ll use the absence of patterns as an indicator of how good the chaos is. Does this sound crazy enough to you? Let’s do it!
A Pattern in The House
Our experiment will be easy. We are going to generate 100 numbers with different uniform random generators and plot the results in a 2-D graph. If the generator is good enough, we shouldn’t recognize any pattern in the graph. Easy, right?
Yes, but let’s make sure we have a clear idea of what a pattern could look like in this context. Let’s draw a little bit:
In the previous image, we have three examples of patterns. At the top-left corner, we have the most ordered result where each trial produces almost the same number. As you can imagine, this is the worst possible random generator.
Then, at the top-right corner, we have a slightly better generator, but still a deficient one. All the results are condensed between 0 and 0.5. The generator doesn’t generate numbers that are near 1.
The last plot shows a generator that seems to produce all the possible numbers between 0 and 1. But we still notice a pattern. Approximately the same numbers are generated every certain number of trials. That’s why we see this periodic behavior in the plot. We don’t want that either, the generated number should be independent of which trial are we doing.
Now we have an idea of how to identify patterns when evaluating random generators. Let’s see how good our custom generator is!
Evaluating and Improving Our Custom Generator
We are going to use matplotlib to create our plots:
As you see, in the previous code we modified the value of the max_value
(from 2^32 to 100) and the multiplier
(from 1103515245 to 10 ). The goal here is to learn how these parameters affect the quality of the generator.
The previous code produces the following plot:
Almost all the values are zero. If you follow the generating process by hand, you’ll realize that after the first generation, all the remaining values will always be zero. It seems that the combination of this max value and multiplier is not good. Let’s try with another multiplier.
If we try with multiplier = 1103515245
we get the following results:
Very similar to the previous one! In this case, it is more difficult to follow the generation process by hand, but with the help of a calculator or a simple program, we realize that the generator gets stuck again. Maybe the problem is with max_value
, maybe 100 is not a good choice. Let’s try with a power of two. This is what happens when we use 512 as the max value:
This is better! We won’t get into why this choice is better, but you can bet that it has a mathematical explanation! If you want to know it let me know in the comments. (Remember that paid subs can directly pick the topic of future posts!)
But if you look carefully, you’ll notice a pattern. You can try rotating the plot (or your head) 45 degrees. There are bands of numbers that repeat at periodic intervals. Diagonal patterns if you wish:
So this is not good enough. The problem is that the sequence of numbers ends up repeating itself at a certain number of trials. The bad news is that we can’t choose parameters that avoid this repetition. For example, we can pick a higher power of 2 for the max value (like 2^32), but the pattern will appear if we do a sufficiently large number of trials.
But there is still hope. We can tweak our algorithm a little bit to avoid this repetition or at least require an immense number of trials to repeat the same sequence.
We added a new parameter. An increment to the product. With this simple trick, we get this new result:
The diagonal pattern disappeared! Intuitively, we can think of this increment as a noise we add to the generation process. After all, we want a noisy result, we want chaos. This algorithm is the winner (although you’d want to use a large max value like 2^32).
Conclusions
In these two first chapters of a (potential) series about probabilistic algorithms, we have learned a few important lessons:
It is challenging to generate randomness with a classical computer because they are deterministic systems
We can use some tricks to create pseudo-random numbers
By generating uniform random numbers we can generate any other random process
Measuring the quality of a random generator is hard because we are programmed to detect patterns, not chaos
But we can use the absence of patterns as an indicator of a good random generator
As I see it, the series can take two possible paths:
Theoretical Path: A mathematical explanation of how we should choose the parameter values and how to rigorously test the quality of a generator.
Simulation Path: After knowing how to generate a uniform random number, we can learn how to generate any other random numbers and implement probabilistic algorithms.
I think both paths are exciting. So please, let me know what is your favorite! I hope you have enjoyed these two first chapters. This is just the beginning! Thanks for reading one more week.
See you next Tuesday!
This was great! I love the series.