The secret to guessing more accurately with math


What’s in the box?

Professor25/Getty Images

Imagine I showed you a box and asked you to guess what’s inside, without giving any more details. You may think this is completely impossible, but the nature of the container gives some information – for example, the contents must be smaller than the box, while a solid metal can can hold liquids and withstand temperatures that a cardboard box would struggle with.

Is there a way to describe this process of guessing with limited information in a mathematically sensible way? Obviously, there are some things that cannot be reliably guessed—the flip of a coin, the roll of a die—and we call these random. But for everything else, a few useful tools can make you much better at narrowing down your guesswork, rather than picking an answer out of the ether.

A bounded guess is essentially an estimate, and these have a long history. Perhaps the most impressive early example is that of the ancient Greek philosopher Eratosthenes, who lived in Alexandria, Egypt, in the 3rd century BC. With a few simple ideas, he was able to estimate the circumference of the Earth with surprising accuracy. His exact method is lost, but we can reconstruct it thanks to texts written after his work.

Essentially, Eratosthenes knew that at midday on the summer solstice, the sun appeared to be directly overhead in the ancient city of Syene, casting no shadow down a well. Meanwhile, on the same day and time in Alexandria, a vertical pole cast a shadow with an angle of about 7 degrees, or about 1/50 of a circle. He knew that the distance between the two cities was 5,000 stadia, a unit of length, so estimated that the full circumference of the earth must be 50 times this, or 250,000 stadia.

Eratosthenes made a few approximations about the geometry here, but we can ignore that. What is a little more difficult is that we do not know the true value of a stadium. It is believed that Eratosthenes used something roughly equivalent to 160 meters. That gives us a circumference of 160*250,000 = 40,000 kilometers, remarkably close to the modern measure of 40,075 kilometers. Of course, different values ​​for a stadium (they range from 150 to 210 meters) give you a different answer and a different degree of accuracy, depending on how generous we want to be to Eratosthenes.

This was the world according to Eratosthenes, but he was able to estimate the circumference of the earth quite accurately

Chronicle/Alamy

The point here is that a few simple but reasonable calculations can give you a pretty powerful guess – measuring a planet without having to go around it. The master of this in the 20th century was the physicist Enrico Fermi, who built the first ever nuclear reactor and played a key role in the American Manhattan Project to develop an atomic bomb. He was present at the first detonation of such a weapon, the Trinity test, and attempted to estimate the force of the explosion—no one was quite sure what it would be—by dropping small pieces of paper and seeing how they were moved by the explosion. Like Eratosthenes, his exact technique was never recorded, but his estimate that it was a 10 kiloton bomb is about half the true value of 21 kilotons accepted for the Trinity yield today. It’s not perfect, but at least it’s on the right track.

In fact, landing in the right ballpark was kind of Fermi’s schtick—he loved these kinds of back-of-the-envelope estimates, so much so that they’re now known as Fermi problems. The classic example is a challenge he would set his students: estimate how many piano tuners there are in the city of Chicago. Starting with the population of Chicago (around 3 million), we can assume that the average household has four people, so that’s 750,000 households. If one in five owns a piano, there are 150,000 pianos in Chicago. If we assume that a piano tuner can work on four pianos per weekday, they can come up to around 1000 a year. So if the 150,000 pianos are serviced annually, there must be 150 piano tuners in Chicago.

The point of this estimate is not that it is correct, but that it is limited in its error. We’ve made a number of assumptions along the way – but given that some will be overestimates while others will be underestimates, and assuming you don’t have a bias in one direction, the errors are likely to be limited. If our calculations had indicated that there were a million piano tuners in Chicago, for example, you can be pretty sure it’s wrong.

While Fermi estimation is a powerful technique for initial guesses, we sometimes gather new information that can help us refine our initial answer. Let’s go back to the box example I started with. If I pulled a blue ball with the number 32 on it out of the box, would that change your guess about its contents? You can assume that there are other balls inside the box, that some of them are blue, and that others have numbers – but is there a way to quantify this? Yes, thanks to Thomas Bayes, an 18th century statistician and church minister.

A portrait believed to be of Thomas Bayes

Public domain

Baye’s amazing insight was to turn probability on its head, transforming it from a tool for understanding randomness—like the outcome of a coin flip—into a framework for measuring and auditing uncertainty. He laid out an equation, Bayes’ theorem, to turn observations into evidence. It consists of four parts: prior, evidence, probability and posterior. Let me explain each separately.

The assumption is our basic assumption. Let’s imagine I’m serving three flavors of ice cream at a party (chocolate, strawberry, and vanilla) and I want to know which one is going to be the most popular so I can be sure to stock up. A reasonable basic assumption is that flavor preferences are evenly distributed among people, with one-third of the population liking each flavor. But then the party starts and I start to get nervous. The first 10 people have all gone for chocolate – that’s my proof.

Here it gets a little complicated. To define the probability, I need to look at my original assumption. If the taste preferences were really the same, what are the chances of seeing 10 chocolates in a row? The answer is (1/3)^10, or about 1 in 60,000. That’s pretty unlikely, which suggests that my original assumption is probably wrong, and I need to update it to assume a much higher preference for chocolate, which in turn would give us a higher probability of seeing the observed evidence. That update gives us the rear.

This theorem turns out to be extraordinarily powerful. Back to my box example: the first ball I’ve pulled out massively limits the possibilities of what’s inside. If I pull out another ball, this one red and labeled “50,” that further narrows the possibilities—you now know that there are at least two colors of balls, and assuming they’re even numbered in sequence, the total amount is likely to be small (under 100) rather than large (more than a million). Each ball I pull out gives you even more evidence, which you can use to update your prior each time.

One place you may have encountered Bayes’ Theorem without knowing it is your email inbox. The earliest spam filters used Bayesian reasoning, assuming a certain percentage of emails are spam (the prior), and then using emails you and your service provider label as spam (the evidence) in combination with the chance of certain words and phrases appearing in spam emails (the likelihood) to determine which emails are actually spam (the posterior).

Spam filtering illustrates why guessing is not a mathematical trick of boxes, but relevant to the real world. And leveraging these techniques—Fermi estimation and Bayesian reasoning—is more important than ever in the world of pattern-matching AIs like ChatGPT. As I’ve written recently, the way modern AIs are built means they often seek to confirm rather than update or challenge your assumptions, matching existing patterns without fully considering new evidence that doesn’t fit. Don’t let an AI guess wrong for you – learn to do it right yourself.

Topics:

Add Comment