Kelly criterion and statistical weirdness

Fri 04 November 2016

I was prompted to write this after happening across this fascinating article in the Financial Times. Unhappily it's behind a paywall so I'll recount the gist of it here.

Imagine you had a coin that had a 60% or 0.6 chance of tossing to give heads and a 40% or 0.4 chance of tails. Now imagine you have $25 to gamble and you can choose to bet all or some of this amount on each toss. If you stake $20 and lose you're down $20, but you'd be up by $20 if you win. You then get to do it again with the $5 or $45 you'll have for the next toss, and so on.

The question is: what's the best strategy for maximising your winnings after a given number of tosses? Rather than idly muse on this, have a go at it on this website's simulator. Alas, there's no longer actual money winnings on offer but there is still an intellectual reward.

I recommend you do it now before reading further.

There now follows a brief intermission in this blog post.

How did you do? I did badly when I first tried it, but my heart, and more importantly, mind, were in the right place. I only bet on heads and concentrated my thoughts on what fraction of my current pot I should stake on the next toss. I didn't last the full half hour because a) I dropped down to a few dollars, and b) was mentally salivating at the prospect of resolving my confusion with mathematical statistics.

Yes, I know. A bit sad.

Sans maths

I'll start with a non-mathematical account.

Firstly, here's the answer to the question above: the best proportion of your pot to bet is a 20% or 0.2. This comes from the fact that the probability of getting a heads is 0.6. You double 0.6 to get 1.2 then subtract 1, which gives 0.2.

What's the significance of the double and subtract one thing?

Well, that's a little hard to explain. To get an idea imagine you have a $1 stake. There's 0.6 chance of being up $1 and a 0.4 chance of being down $1 after the toss. So you might expect, in some kind of average sense (averaging over many parallel Universe where this toss took place!), to be up $0.20. And the same applies for any stake. Bet X dollars and you should expect, on average, to be 0.2 times X dollars up as a result.

You can also sense check this result for a fair coin — one with a 0.5 probability of a heads (or tails). Two times 0.5 minus 1 is zero, so you would expect, on average, to be up (or down) by zero dollars on each toss.

However, this only explains why the 0.2 probability is significant. It doesn't explain why you should use that as the fraction of your pot to maximise your winnings.

Alas, I cannot think of an intuitive explanation for that. Time for maths I'm afraid.

Go maths!

First, let the probability of getting heads be \(p\) and, as we're considering an unfair coin, let it be greater than 0.5. Also, let's suppose we bet on Heads on every toss (because we're not stupid).

Let the starting amount in the pot be \(z_0\) and suppose we bet a fraction \(a\) of whatever is in the pot for every toss. Then, using caps to denote a random variable, the amount in the pot after the first toss is:

$$ Z_1 = z_0 + a z_0 T_1 $$

where \(T_1\) is the random variable representing the first toss. Being a random variable we cannot say which value it will have until the result of the toss is known. Beforehand all we can say is that \(T_1\) has a probability \(p\) of being +1 if Heads comes up (a win!) and -1 if it's Tails.

We can re-write the above equation as

$$ Z_1 = z_0 (1 + a T_1) $$

and for \(Z_2\) we will have

$$ Z_2 = Z_1 (1 + a T_2) $$

where \(T_2\) represents the second toss. It is defined exactly like \(T_1\) but it is statistically independent of \(T_1\) and for that matter any other \(T_n\). In other words, the outcome of one toss cannot affect any other.

At this point note that we've multiplied two random variables together. We'll return to the significance of this later, but for now I'll let Robby the Robot pass comment:

Combining the last two equations gives:

$$ Z_2 = z_0 (1 + a T_1)(1 + a T_2) $$

and we can extend this to any \(n\) by doing this over and over again to obtain:

\begin{equation} Z_n = z_0 (1 + a T_1)(1 + a T_2) \cdots (1 + a T_{n-1}) (1 + a T_n) \label{Zn} \end{equation}

Now this is all fine and dandy, but random variables cannot be directly compared with real world results. To do that we need to use the expectation operator. It will extract the mean (more colloquially known as the average) from the random variable.

Let's start with \(T_n\). To find its expectation we multiply each outcome by its probability and add all of these up, thusly:

$$ E[T_n] = p\cdot 1 + (1-p)\cdot (-1) = 2p-1 $$

and voila, it's the first appearance of the mystical probability \(2p-1\). This is such an important probability in these workings that we'll call it \(q\) to make things a bit neater, i.e.

$$ q=2p-1 $$

We'll need the expectation of \(1+aT_n\) which is simple to work out because neither 1 nor \(a\) is a random variable, so:

\begin{equation} E[1+aT_n] = 1 + a E[T_n] = 1 + aq \label{expect} \end{equation}

Next we make use of the fact that each \(T_n\) is statistically independent from the rest. If we have two independent random variables \(U\) and \(V\) then the expectation of the product is the product of the expectations: \(E[UV]=E[U]\cdot E[V]\).

Each term in brackets in our expression for \(Z_n\) in (\ref{Zn}) is an independent random variable so

$$ E[Z_n] = z_0 E[1 + a T_1]\cdot E[1 + a T_2] \cdots E[1 + a T_{n-1}]\cdot E[1 + a T_n] $$

and because (\ref{expect}) tells us all the expectations are \(1+aq\), we have our result

$$ E[Z_n] = z_0(1 + aq)^n $$

To state this in English: the expected result of making \(n\) bets on heads with fraction \(a\) of the pot staked each time results in us multiplying our starting pot by a factor of \(1+aq\) raised to the power \(n\).

This can be very large. Clearly it's maximum if we set \(a=1\), and for 20 tosses of a coin with a 0.6 probability of heads we can expect to multiply our starting pot by 1.2 raised to the power of 20, which is 38. That's a pretty amazing return.

But there's a problem. Setting \(a=1\) means you're betting your entire pot on heads on each coin toss. If there's just one tails at any point, your pot is emptied and it's game over. How likely is it that you'll get through 20 coin tosses without a single tails? Not very.

Have we made a mistake in the Maths? Nope. Is one of the statistical steps wrong? Nope. The problem is that \(Z_n\) is a random variable with unintuitive properties. In other words, our expectation is correct but useless for our stated aim: finding the best value of \(a\) for maximising our winnings. We should've heeded Robby the Robot's warning about multiplying random variables.

The Central Limit Theorem to the rescue

So our random variable \(Z_n\) is weird. Specifically its distribution is weird. In fact, \(T_n\) is also a bit weird if you think about it. It can only ever take on two values -1 or +1 and, for \(p=0.6\), the expectation operator tells us to expect a value of 0.2 — a value that \(T_n\) can never have.

The working above is all ruthlessly logical and correct, but we got into trouble because we didn't just start with a weird distribution but we compounded it \(n\) times to produce a weirder distribution for \(Z_n\). It's still possible to work in this intuition-free zone and compute probabilities of winning over a certain amount but to do so would be climbing the north face of mount statistics.

There's a beautiful thing in mathematical statistics called the Central Limit Theorem. It says that if you add together many random variables that have the same distribution then the resulting random variable will end up approximating a normal distribution. The term "normal" is somewhat unfortunate as it is the name of the distribution not a description of it. The normal distribution is also known as a gaussian, or more colloquially as the bell curve because it looks like a bell when graphed. The Central Limit Theorem applies even if the individual distributions are not bell curves themselves and, with some conditions, even if the individual distributions are different.

In our case the distributions of the random variables \(T_n\) are all identical, but the problem is that we multiply them together to form \(Z_n\). We can enlist the help of the trusty logarithm function to turn a product of random variables into a sum because \(\log UV = \log U + \log V\). Doing this gives

$$ \log Z_n = \log z_0 +\log (1 + a T_1)+ \log (1 + a T_2) + \cdots + \log (1 + a T_n) $$

Next we need to compute the expectation of each term in the sum:

$$ E[\log (1+ a T_n)] = p \log (1+a) + (1-p) \log (1-a) $$

As before, there are two outcomes with \(T_n\) being +1 with probability \(p\) and -1 with probability \(1-p\). As this expectation is the same for any \(n\), we find:

$$ E[\log Z_n] = \log z_0 + n E[\log (1+ a T_n)] $$

which leads us to this

$$ E[\log Z_n] = \log z_0 + n \left( p \log (1+a) + (1-p) \log (1-a) \right) $$

For \(a=0\) (never bet anything) this is just \(\log z_0\). As \(a\) approaches 1, the expectation decreases without limit towards \(-\infty\). This means \(Z_n\) approaches zero and accords with our observation above that betting the entire pot will be game-over when the first tails is thrown.

So somewhere in between these two useless extremes is our desired best value of \(a\). To find it we need to take the derivative with respect to \(a\) and find the value of \(a\) which makes it zero. So,

$$ \frac{d}{da} E[\log Z_n] = n \left( p \frac{1}{1+a} - (1-p) \frac{1}{1-a} \right) $$

and with a little bit of algebra that I leave as an exercise to you, dear reader, this is zero when

$$ a=2p-1=q $$

QED.

I'm not going to do it here but it's not too difficult to show that as \(n\) increases, the spread of values about the expected value becomes proportionately smaller.

In other words, the laws of probability and large number statistics say that not only will your winnings increase with more and more tosses but you can be be relatively more certain of how much you will win. How nice.