When Bayes’ friend Richard Price was sorting through his papers, he stumbled upon an essay with an intriguing title: “An Essay towards solving a problem in the Doctrine of Chances.” As Price read through the old document, a formula caught his eye.

Price was so captivated by the ideas in the essay that he dedicated the next two years of his life to exploring and developing them further. Finally, he sent the polished work to the Royal Society of London, where it was published in 1763. The publication sent shockwaves through the mathematical community, and Bayes’ name was etched into the annals of history.

At the heart of the essay was Bayes’ theorem, a mathematical formula for calculating conditional probabilities. The theorem is based on a simple yet powerful idea: when we update our initial belief with new, objective information, we arrive at a new and improved belief. This updated belief then serves as the foundation for the next iteration of belief refinement.

The story of Bayes’ theorem and its impact is the subject of Sharon Bertsch McGrayne’s book, The Theory That Would Not Die. The book explores the controversial history and diverse applications of Bayesian statistics, showing how a simple mathematical formula can have far-reaching consequences. Who knew a stats equation could be so contentious?

What did I get out of it?

“The Theory That Would Not Die” about Bayes’ Rule is one of the most interesting, nontechnical, books about technology that I have ever read.

The book is about one of the most used scientific tools of our time. If you have used chat GPT, you should know that Bayes theorem is the backbone of large language models.

More than the technicalities of Bayesian statistics, the book explores human psychology. It is about the hubris, arrogance and human frailties of mathematicians and scientists upon whom we rely for a better future. The book is like a history channel documentary of how mathematicians tried to destroy those with whom they disagreed, of childish behavior, of lost opportunities for humanity and of how in the end humanity gained knowledge in many fields.

Some of the things I took away from the book are:

Updating our beliefs

The importance of continuously updating our beliefs in light of new evidence. It’s a reminder that our understanding of the world is never static, but rather an ongoing process of learning and growth. By embracing this mindset, we open ourselves to new possibilities and a deeper understanding of the world around us.

Be on the lookout for new data

Conceptually, Bayes’ system was simple. We modify our opinions with objective information: Initial Beliefs (our guess where the cue ball landed) + Recent Objective Data (whether the most recent ball landed to the left or right of our original guess) = A New and Improved Belief.

Whether it’s managing a project, making investment decisions, or planning something like a family trip, staying open to new information is key. It helps avoid sticking too rigidly to an initial guess and makes decision-making smarter with every new piece of information. Just adapt and refine plans as new details emerge—that’s the smart move.

It’s an iterative process

Prior for the probability of the initial belief; Likelihood for the probability of other hypotheses with objective new data; and Posterior for the probability of the newly revised belief. Each time the system is recalculated, the posterior becomes the prior of the new iteration. It was an evolving system, which each new bit of information pushed closer and closer to certitude.

This ongoing process is super useful when dealing with situations where information continuously evolves. Whether assessing risk, making predictions about investment positions, this method ensures that decisions get sharper with every new piece of information you add.

The heuristic

knowledge is indeed highly subjective, but we can quantify it with a bet. The amount we wager shows how much we believe in something.

We don’t have to get technical about applying the Bayes formula. Quantifying our updated beliefs is all about putting a number on how sure we are about what we know. Saying you believe something is one thing, but betting money on it shows you really mean it. The more you’re willing to bet, the more confident you are.

This concept is like a real-life application of Bayes’ Theorem, where your “bet” represents your current belief, and how this belief can change as new information comes in. Are you willing to bet more or less, in light of this new information.

For instance, when you’re thinking about making a new purchase, consider your decision to buy as placing a bet. The amount you’re willing to spend reflects your confidence in being satisfied with the purchase. This perspective not only makes you think more critically about how much you truly value the item but also prompts you to gather as much information as possible before committing your money. It’s about not just going by impulses but evaluating your choices based on the evidence (like reviews, product comparisons, or personal needs assessment). This method underlines the importance of continuously assessing your decisions, especially as new insights or information become available—like finding a better deal or a newer model just before you buy.

Understanding Bayesian thinking

There is the heuristic above but there is mathematical thinking involved in applying Bayesian thinking. It sounds complex and counter intuitive, but it’s really about improving our guesses.

The mathematics

Prior times likelihood is proportional to the posterior.

It’s like mixing ingredients based on a recipe. The ‘prior’ is like your base ingredient, say, flour in a cake—it’s what you start with. The ’likelihood’ is like adding eggs based on how rich you want the cake to be—the new data that can change things up. When you mix these together, you don’t just get flour or eggs; you get something new that’s a combination of both, which in Bayesian terms, is called the ‘posterior’. It’s the updated belief, and it’s based on mixing what you knew before (the prior) with what you just learned (the likelihood).

This formula is useful for making decisions when we’re not starting from scratch but have some initial idea or data. For instance, if we’re analyzing financial markets, or making forecasts, we often have an initial understanding or model. As new data comes in, we adjust this model. The principle that ‘prior times likelihood equals posterior’ ensures that we’re systematically refining our predictions or beliefs based on both what we previously thought and what new evidence or data shows.

Because humans can never know everything with certainty, probability is the mathematical expression of our ignorance: “We owe to the frailty of the human mind one of the most delicate and ingenious of mathematical theories, namely the science of chance or probabilities.”

Understanding this concept is crucial, especially when making decisions in uncertain situations—like investing, planning projects, or even daily life choices. It teaches that being uncertain isn’t a problem; it’s a normal part of human knowledge. Probability, especially through tools like Bayes’ Theorem, gives a structured way to improve our decisions as we gather more information. This way, we’re not just guessing blindly; we’re making informed decisions based on the best available evidence.

It’s not counter intuitive but natural

In our struggle to survive in an uncertain and changing world, our sensory and motor systems often produce signals that are incomplete, ambiguous, variable, or corrupted by random fluctuations. If we put one hand under a table and estimate its location, we can be off by up to 10 centimeters. Every time the brain generates a command for action, we produce a slightly different movement. In this confusing world, Bayes has emerged as a useful theoretical framework. It helps explain how the brain may learn. And it demonstrates mathematically how we combine two kinds of information: our prior beliefs about the world with the error-fraught evidence from our senses.

Bayes’ Theorem isn’t just a dry, academic concept; it’s actually similar to what our brains do naturally every day.

Our brains continuously try to make sense of things. It’s like trying to figure out where your hand is without looking—your brain guesses based on what it already knows and what it feels, even though that information isn’t always perfect.

Bayes’ Theorem comes into play as a mathematical tool that helps explain how our brains merge these two types of information: what we already believe (our priors) and what new, often imperfect, data we get from our senses.

if we are certain about the evidence relayed by our senses, we rely on them. But when faced with unreliable sensory data, we fall back on our prior accumulation of beliefs about the world.

When the information we gather through our senses (like seeing, hearing, or feeling) is clear and reliable, we tend to trust it to make decisions. However, when this sensory data is uncertain or confusing, we lean on our previously accumulated knowledge or beliefs about the world to fill in the gaps.

If we hear a rumor that’s hard to verify, we might judge its truth based on what they already know about the people involved.

Bayesian thinking is basic to everything a human does, from speaking to acting. The biological brain has evolved to minimize the world’s uncertainties by thinking in a Bayesian way. In short, growing evidence suggests that we have Bayesian brains.

Our brains have evolved to handle the world’s uncertainties by processing information in a Bayesian manner. It means that combining past experiences (prior knowledge) with new information (evidence) isn’t just a good strategy—it’s how our brains are wired to work.

Learning and Adaptation: When learning new skills or adapting to new environments, it’s important to use past experiences to guide new actions. For instance, a student learning a new subject can build on prior knowledge while integrating new concepts.
Problem-Solving: In both personal and professional life, effective problem-solving involves using what is already known and updating this with new information as it becomes available. For example, in project management, past project outcomes can inform decisions on current projects.
Communication: Understanding that people naturally think in Bayesian terms can improve how we communicate. Presenting new information in a way that connects to what others already know can make the communication more effective and persuasive.

The history lesson and modern-day application

Today, probability, the mathematics of uncertainty, would be the obvious tool, but during the early 1700s probability barely existed. Its only extensive application was to gambling, where it dealt with such basic issues as the odds of getting four aces in one poker hand.

Despite its humble beginnings, Bayes’ Theory has many important applications. Today Bayes theorem sits at the heart of everything around us, from machine learning, artificial intelligence and robotics to medical diagnostics and environmental sciences.

Winning the world war

the battle of Bayes has raged for more than two centuries, sometimes violently, sometimes almost placidly, . . . a combination of doubt and vigor.” Thomas Bayes had turned his back on his own creation; a quarter century later, Laplace glorified it. During the 1800s it was both employed and undermined. Derided during the early 1900s, it was used in desperate secrecy during the Second World War and afterward employed with both astonishing vigor and condescension.

In 1774, the brilliant French mathematician Pierre-Simon Laplace expanded upon Bayes’ theorem, before the theorem all but disappeared from sight until the 20th Century, when the British codebreaker Alan Turing used it during the Second World War to help crack the ‘unbreakable’ Enigma code, a development that helped the Allies win the war.

Turing used Bayes’ ideas to create a system that let him guess a bunch of letters in an Enigma message. He would figure out the chances of each guess being right and then add more clues as he got them. This way, he could narrow down the number of wheel settings he needed to try. And that’s how he finally cracked the code.

The insurance industry

Bailey was horrified to see “hard-shelled underwriters” using the semi-empirical, “sledge hammer” Bayesian techniques developed in 1918 for workers’ compensation insurance.2 University statisticians had long since virtually outlawed those methods, but as practical business people, actuaries refused to discard their prior knowledge and continued to modify their old data with new. Thus they based next year’s premiums on this year’s rates as refined and modified with new claims information. They did not ask what the new rates should be. Instead, they asked, “How much should the present rates be changed?”

Insurance underwriters were intuitively using Bayesian thinking to set insurance rates. Despite being seen as outdated or too simplistic by academic statisticians, actuaries in the field continued to use them because they valued the practicality of incorporating prior knowledge into new decisions. Instead of starting from scratch each year, they adjusted existing rates based on new claims data, asking how much they should change the rates rather than what the new rates should be.

Computer age

In 1996 Bill Gates, cofounder of Microsoft, made Bayes headline news by announcing that Microsoft’s competitive advantage lay in its expertise in Bayesian networks.

The nice thing with Monte Carlo is that you play a game of let’s pretend, like this: first of all there are ten scenarios with different probabilities, so let’s first pick a probability. The dice in this case is a random number generator in the computer. You roll the dice and pick a scenario to work with. Then you roll the dice for a certain speed, and you roll the dice again to see what direction it took.

With the advent of the computer age, the use of Bayesian theory has exploded, into such areas as artificial intelligence, robotics, law, imaging technologies and medical diagnostics. Today Bayes techniques are used in spam filters, voice recognition systems, recommendation systems and in Google search.

Who should read it

This book is a story of how a way of thinking changed over time. It’s full of lessons about how we think and about people in general. So if you like history, this book gives you a new way of looking at the same stuff you’ve been learning about since you were a kid.

The book doesn’t have a lot of instructions, but it uses history to show us how important it is to think critically. Critical thinking matters because we usually stick to what we believe. Robert Cialdini talks about this in his book, Influence. He says that when we commit to something, we naturally want to look consistent, especially in front of others.

Most of the time, being consistent is a good thing and helps us. But we get so used to being consistent that we do it even when it’s better to change our minds.

Being able to change direction when we get new, clear information is a useful skill to have. It can help us fix mistakes or choose a better path that we didn’t see before. The key is to have the right facts and to think about them in the right way.

If you want to use Bayes’ Rule you should read the book to learn how others have used Bayes, but you won’t learn the details. By the end, you’ll know where to look for more information and where not to look.

The book shows us that you don’t have to be a math whiz to use Bayes’ ideas in your everyday life. It’s really about being open to new information and using it to make better choices. And that’s something we can all do, no matter what we’re doing or where we are in life.

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy