Why everything becomes a bell curve
The bell curve turns up in heights, measurement errors, exam marks and sensor noise, and that is not a coincidence — it is a theorem. We work through the central limit theorem with a Galton board and a random walk: why adding up enough independent things, almost whatever they are, always lands on the same curve, and when it doesn't.
Here's something a bit odd once you notice it. The same curve keeps showing up in completely unrelated places: the heights of a thousand people, the errors in a batch of measurements, the marks on an exam, the noise on a sensor line, the spread of a dart-throw. Different causes, different units, and yet the histogram is always that same humped, symmetric bell. That can't be a coincidence, and it isn't. It's about the closest thing statistics has to a law of nature, and I think it's genuinely beautiful once you see why it has to be true.
The short version: when you add up a lot of independent random things, the sum stops caring what the individual things looked like and settles onto one fixed shape. That's the central limit theorem (CLT), and the rest of this post is us working out why the bell, and not some other curve, is the one everything falls into.
Watch it happen
Let's start by building one. A Galton board is a triangle of pegs; you drop a ball in the top and at every peg it bounces left or right, roughly fifty-fifty. By the time it reaches the bottom it's made a string of independent little decisions, and where it lands records how many went right. Drop one ball and it's anyone's guess. Drop a few thousand and a shape appears.
Each ball's final bin is a sum of independent steps, so the bins follow the binomial distribution: bin collects a fraction of the balls. Crank the rows up and the binomial gets smoother and smoother until it's indistinguishable from the bell curve drawn over the top. We didn't put the bell there. It emerged from nothing more than "add up a lot of coin flips".
The bell is an attractor, not a coincidence
This is the part worth sitting with. The Gaussian isn't winning a popularity contest among distributions; it's a fixed point. If you take two independent Gaussians and add them, you get another Gaussian. No other "nice" shape with finite spread survives being added to copies of itself like that. So when you sum many independent things, the sum is pulled toward the one shape that's stable under summing, and it stays there. The individual things can be coin flips, uniform spinners, dice, lopsided yes/no events. The lumps and asymmetries average out, and what's left is the attractor.
A sum is a walk
There's a second way to picture the same fact, and it connects to a thing I find I reach for constantly: a sum of random steps is just a random walk. Each step nudges you left or right; the position after steps is the running total. Watch a few walks and you'll see them fan out from the origin, and the cloud of where-they-end-up is (of course) a bell.
The theorem, stated plainly
Take independent, identically distributed random variables , each with mean and a finite variance . Their variances add:
Now centre the sum (subtract its mean) and rescale it by that width. The claim of the CLT is that this standardised sum stops depending on the details of at all and converges to a single distribution:
and that limit is the bell curve itself, whose density is
What I love here is how little the theorem asks for. It doesn't care whether is a coin flip or a die or a lopsided spinner. It only needs the pieces to be roughly independent and to have a finite variance. Hand it that, and out comes the same curve every single time.
Why the curve, in one line of hand-waving
If you want the intuition rather than the proof: adding independent things multiplies their characteristic functions (their Fourier transforms), and the Gaussian is the function whose shape is its own Fourier transform. Repeated multiplying-and-rescaling washes out every feature except that self-similar one. So the bell is, loosely, the Fourier-stable fixed point of summing. The rigorous version is exactly this argument done carefully, and it's one of the tidier proofs in mathematics.
When it doesn't work
A theorem is only as trustworthy as its assumptions, and the CLT's assumptions are quietly load-bearing. Push on them and the bell goes away.
The big one is finite variance. The proof leans on being a real, finite number to rescale by. Some distributions don't have that. The classic troublemaker is the Cauchy: average a thousand Cauchy samples and you get... another Cauchy, exactly as wild as a single sample. No sharpening, no bell, no convergence. Its tails are too heavy; rare enormous values dominate the sum no matter how many you take.
Fat tails are where people get hurt
This isn't a maths-class curiosity. Plenty of real quantities are fat-tailed: incomes, city sizes, file sizes, insurance losses, the daily moves of a market. They look bell-ish in the calm middle, so it's tempting to model them as normal, and then a "twelve-sigma" event that the bell says happens once in the age of the universe shows up twice a decade. The CLT is a promise about thin-tailed, finite-variance, independent things. Borrow it for anything else and the maths will be quietly, expensively wrong. Independence matters too: correlate the pieces and the variances no longer simply add, and the convergence slows or stalls.
Some food for thought, and I don't have a clean answer: the bell curve is so good at describing the well-behaved middle of things that it trains our intuition to expect the middle everywhere. A lot of expensive surprises (financial blow-ups, "unprecedented" failures) are really just someone's bell-curve intuition meeting a fat-tailed world. Worth keeping a little suspicion handy whenever a histogram looks reassuringly normal.
So why is everything a bell curve?
Because so much of what we measure is secretly a sum. A person's height is a sum of many small genetic and environmental nudges. A measurement error is a sum of many tiny independent errors in the apparatus. Sensor noise is a sum of countless little physical disturbances. Each of those is a Galton board you can't see, dropping its ball through a thousand invisible pegs, and the CLT guarantees where the pile ends up. The bell isn't a fact about heights or errors or noise. It's a fact about adding, and adding is everywhere.
I want to come back to the Gaussian from the other direction in a later post, because it's also the maximum-entropy distribution for a given variance, which is a completely different reason for it to be everywhere and yet lands on the same curve. Same shape, two unrelated stories. That kind of coincidence-that-isn't is my favourite thing in maths. Watch this space.
Reading further
- de Moivre (1733) and Laplace: the bell curve was first found exactly here, as the limit of the binomial — the Galton board, two centuries early. A nice account is in any history-of-statistics text; Stigler's The History of Statistics is the standard one.
- Galton, Natural Inheritance (1889): where the quincunx (the board above) comes from, and a lovely period read on the idea of regression to the mean. galton.org
- Strogatz, The Joy of x: the gentlest correct explanation of why the bell is special, if you want intuition over proof.
- Taleb, The Black Swan: the case against borrowing the bell for fat-tailed things, argued at length and with feeling. Read it as the warning callout above, expanded to a book.
Try it in the lab
All effects →Random Walk
mathsStochastic walks converging to the Gaussian via the central limit theorem.
statisticsprobabilityConformal Grid
mathsComplex mappings deforming a Cartesian grid — Joukowski, power maps, inversion.
complex analysisdifferential geometryDouble Pendulum
mathsChaotic pendulums diverging from near-identical starting conditions.
chaosode
More from the blog
Every Wave Is a Circle: Fourier Series as Epicycles
We are taught Fourier as an integral to memorise. The geometric truth is older and stranger: any periodic signal, however square or spiky, is drawn by a stack of spinning circles — the same epicycles Ptolemy used for the planets.
The Lorenz Attractor and the Limits of Prediction
Determinism does not imply predictability. Lorenz's three-equation toy weather model is fully deterministic yet unknowable past a horizon you can compute — a geometric fact, not an engineering failure.
Phase Portraits: See a Differential Equation Before You Solve It
Most nonlinear ODEs have no formula, yet their fate is readable from the vector field. Poincare's trick: classify fixed points, limit cycles, and chaos by geometry.