A good old AI
In the old 2020, I wrote an article for FreeCodeCamp. I think it is a good idea to rework it and give it to you in the middle of the AI revolution. Let's learn the fundamentals!
Hello, hypers!
When we interact with ChatGPT it feels like we’re living in a Sci-Fi movie. What if I tell you that we can make Artificial Intelligence and Machine Learning with just a basic intuition of what probabilities are?
A lot of the math in Machine Learning is fairly simple. There is another part of Machine Learning that looks more like an art than a science (we’ll talk about it in another post). In this post, we are going to talk about a simple concept that is behind an entire field of Statistical Analysis and some Machine Learning algorithms: conditional probabilities and Bayes’s Theorem.
All this is based on a blog post I wrote back in 2020 for FreeCodeCamp. You can find the original article here.
So, let’s get started!
Conditional probability
Probability Theory is the field of math that we use when we work with uncertain events. Things that can happen randomly. It helps us to put order where it seems to be just chaos.
Think about a fair dice with six sides. What's the probability of getting a six when rolling the dice? That's easy, it's 1/6. We have six possible and equally likely outcomes but we are interested in just one of them. So, 1/6 it is.
But what happens if I tell you that I have rolled the dice already and I got an even number? What's the probability that we have got a six now?
This time, the possible outcomes are just three because there are only three even numbers on the dice. We are still interested in just one of those outcomes, so now the probability is greater: 1/3. What's the difference between both cases?
In the first case, we had no prior information about the outcome. Thus, we needed to consider every single possible result. In the second case, we were told that the outcome was an even number, so we could reduce the space of possible outcomes to just the three even numbers that appear in a regular six-sided dice.
In general, when calculating the probability of an event A, given the occurrence of another event B, we say we are calculating the conditional probability of A given B, or just the probability of A given B.
Event B is called the prior. It truncates the possibilities for event A.
How to calculate conditional probs and Bayes's Theorem
The previous example was great to get started but it doesn’t say too much about how to calculate conditional probs. There is a simple formula that tells us how to do it:
Here, P(A | B)
is the conditional probability of A given B, P(AB)
is the probability of A and B occurring at the same time, and P(B)
is the probability of B.
If A is getting a six and B is getting an even number, we can apply the previous formula to calculate the conditional probability of our previous example:
If we invert the prior (B) and the posterior (A) we get a very similar formula for the conditional probability. We can use this inverted conditional probability to calculate the original one. And this is the Bayes’s Theorem
Now we have a way to calculate conditional probabilities by swapping the events! This idea is simple, but it has an incredible impact on Machine Learning and Statistics.
What is so special about Bayes’s Theorem?
We started rolling a dice and we ended up talking about Bayes’s Theorem. The ideas we discussed didn’t seem too sophisticated. Then, what’s the big deal?
Many things in our daily lives are non-deterministic for us. We cannot be 100% sure whether it is going to rain tomorrow, or if the price of a stock is going up. We try our best given the evidence we have.
Our probabilities are almost always calculated given a piece of evidence. Conditional probabilities are the real deal in practice. Bayes’s Theorem gives us a simpler way to calculate those probabilities.
This theorem tells us that we can do something like reverse engineering with the events. By treating the posterior event like the evidence and vice-versa, we can calculate what we care about. This is a beautiful idea, inverting the timeline to get answers, and the most beautiful part about it is its simplicity.
As I said, Bayes’s Theorem is the foundation of a large field. It is commonly called Bayesian Analysis. The Bayesian adjective shows up a lot in Probabilities and Statistics. It is also a great way to approach Machine Learning and understand the fundamentals. Many Machine Learning algorithms are based (directly or indirectly) on Bayes’s Theorem.
Maybe the most famous one is the Naive Bayes Classifier. In my article, I explain this algorithm step by step and give an example of how to implement it with Python. The best part is that we end up predicting whether a passenger on the Titanic survived the accident using our algorithm. If you want to give it a try, you can do it here.
And that’s it! I hope you have enjoyed and learned something. Remember you will have something like this in your inbox each Tuesday. If you know someone that could be interested in content like this, please consider sharing, it will help to spread the word.