# The Kelly Criterion

Subtitle: Information Theory Reimagined

Two mathematicians from MIT team up with a New Jersey mobster to take down a casino.

## # Shannon and Kelly at Bell Labs

Claude Shannon is known as “the father of information theory” for a paper he published in 1948, “A Mathematical Theory of Communication.” For his master’s thesis at MIT, he showed that logical operators in electric circuits may by implemented using Boolean algebra. Without these two ideas, the digital information age could not exist.

John Kelly, Jr. was a mathematician from Texas who met Shannon when they both worked at AT&T’s Bell Labs in Murray Hill, NJ during the 1950s. He read Shannon’s paper on information theory and realized it could be applied to gambling. He published his idea in the Bell System Technical Journal in 1956, calling it, “A New Interpretation of Information Rate.” Using Shannon’s theory, Kelly understood that depending on the probability of winning the bet, you should adjust how big a wager to make.

## # Information Theory

Shannon wanted to know how much information could be transmitted through a channel, where a channel is any form of communication media such as a telephone wire, a transmission from a satellite, or a wink from one spy to another at the ambassador’s fancy cocktail party. Shannon considered it from the recipient’s point of view, and wondered how much information could be obtained from a message sent through a noisy channel.

He considered it much like a game of Twenty Questions, where one person chooses an object and answers yes or no questions from the other players until one of them can name the object. Shannon wanted to know how many questions on average you might need to ask before you’d know the answer. Let’s consider a simplified version where the item is a letter from the set ${A,B,C,D}$. The best strategy is to first ask, “Is it in the set $\{A,B\}$?”. If the answer is yes, next ask if the letter is $A$. If it’s not in the set, ask if it’s $C$. You’ll always get the right answer in two questions if the other player chooses randomly.

Now, let’s say you’re playing against your young nephew, Abner, who is just learning his alphabet and always carries his stuffed bear toy with him. Little Abner chooses $A$ (for Abner) 50% of the time, $B$ (for Bear) 25%, and splits the other choices $C$ and $D$ evenly at 12.5%. Once you figure this out, your best strategy is to ask if it’s $A$ first, then $B$, then $C$. In half the games, you’ll be right on the first guess, and in another 25% you’ll have $B$ in two guesses. After that, to get $C$ or $D$ will only require one more question. The average number of questions is

$Q = 1 \times 0.5 + 2 \times 0.25 + 3 \times 0.25 = 1.75$

Shannon derived a formula for this, calling it “Information entropy”,

$H(X) = -\sum_{i=1}^n P(x_i) \log_2 P(x_i),$

where $P(x_i)$ is the probability of receiving message $x_i$, and $X$ is the vector of messages, $X = [x_1,x_2, \ldots, x_n]$. You can think of $x_i$ as the letter Abner chooses, $P(x_i)$ is the probability he selects the $i^{th}$ letter, and $H(X)$ is the average number of questions you need to ask him to get the right answer.

Suppose there are $N$ items to choose from, and let’s say $N$ is some integer power of $2$. That is,

$N = 2^Q.$

Because we’re playing a yes/no question game, if the item is selected randomly from the set of $N$, then we can split the set in half on the first question, narrow down to a quarter on the second, and so on until we reach the answer. Taking base 2 logs of both sides,

$\log_2(N) = \log_2(2^Q) = Q \log_2(2) = Q.$

Including the probability of selection, and since $\log \left(\frac{1}{x} \right) = -\log(x)$ gives Shannon’s entropy equation. In Abner’s example,

\begin{aligned} &H(\{A,B,C,D\}) \\ &= -\left[ 0.5 \times \log_2(0.5) + 0.25 \times \log_2(0.25) + 2 \times 0.125 \times \log_2(0.125) \right] \\ &= -\left[ (0.5 \times -1) + (0.25 \times -2) + 2 \times (0.125 \times -3) \right] \\ &= 1.75 \end{aligned}

If a gambler has received some inside information he has an edge over the other gamblers. Suppose Aunt Mildred decides to get in on the game with Abner but doesn’t realize he favors $A$ and $B$. You somehow convince Aunt Millie to bet on which one of you can get to the right answer first. On average you’ll get it in 1.75 while she’ll take 2 questions, so you’ve got an edge. Kelly used Shannon’s entropy equation to calculate the optimal amount to bet when you know how much of an edge you have.

## # The Kelly Criterion

There are two extremes to this concept. Most people don’t gamble at all and fall into the “nothing ventured, nothing gained” camp. At the other extreme, there are stories of people betting everything they own on red at the roulette wheel in Las Vegas. If you are going to gamble, though, there is an optimal point in between which will give the maximum return.

In the early 1980s before I’d heard about Shannon and Kelly I was thinking about this problem. Using some calculus, I came up with the identical solution that Kelly did. Shannon and Kelly used the mathematics of expectations which provides a rigorous proof, but I like mine because it seems a little more intuitive.

Suppose you have a betting opportunity where there are a series of identical bets. Each bet costs the same amount and the probability of winning is the same each time. For example, you might bet on a coin flip that has a 50-50 chance of paying off if it comes up heads. It costs a penny to play and you win a penny if the coin lands heads. In this case, your expectation is zero since you have to pay a penny to play and you win two pennies half the time, so you’re only breaking even. But, maybe you’ve found an edge and can predict the outcome with better than even odds. How much should you bet each time?

## # Breaking the Bank

Claude Shannon left Bell Labs to become chairman of the Mathematics Department at MIT. One day a junior faculty member, Ed Thorp, asked Shannon for help getting a paper published. Roger Baldwin, a mathematician working at the Army’s Aberdeen Proving Ground in Aberdeen, MD along with three associates, used an Army computer to calculate the odds when playing blackjack. The computer was supposed to calculate the ballistic trajectories of gunnery shells, but they took advantage of downtimes at night and discovered that by playing an optimal game the house odds could be brought down to just 0.62.

Thorp read their paper and realized they had assumed the dealer shuffled after every play. Casinos don’t want to slow the game down, so multiple hands are dealt before reshuffling. Thorp recalculated the optimal strategy when the dealer doesn’t shuffle which gave the player the edge when the player counts cards.

As a mathematical paper, Thorp’s didn’t make a big impression. But word got out and news reporters began to call. Gamblers all over the country called to ask for copies of Thorp’s paper. Thorp became an instant celebrity. Thorp and Shannon wanted to apply the card counting blackjack scheme along with Kelly’s method in a Nevada casino, but they needed front money.

Manny Kimmel, a mobster from Newark, NJ ran a numbers racket. He also held a grudge against the owners of some Reno casinos. Watching Thorp play mock blackjack games, Kimmel became convinced that Thorp could beat the casinos. Kimmel wanted to front Thorp and Shannon $100,000, but Thorp talked him down to$10,000. They flew into Reno one weekend and after about 30 hours of play, they were up to about $21,000 and could have been over$30,000 except that Kimmel was betting on the side and losing.

Shannon and Thorp later built the first wearable computer to beat roulette. John Kelly developed a speech synthesizer while working at Bell Labs, and used it to create the song “Daisy Bell” which Arthur C. Clarke included in the movie “2001: A Space Odyssey”. He died of a stroke at age 41 in 1965 and never used his method to make money. Thorp is now president of Edward O. Thorp & Associates where his investments have yielded an average growth of 20% for almost 30 years.