by H.G.Muller » 30 Jul 2007, 11:52
Let me first re-iterate the prior procedure, (and its justifivation), which consists of supplementing the games set with some virtual draws.
In a coin-flip experiment, where the probability for heads is p, and for tails thus q = 1 - p, the probability for having K times heads in N flips is proportional to
Q = P^K * q^(N-K)
In the Bayesian treatment, this probability is 'inverted', and interpreted as the likelyhood that the probability for heads in a single event is p, given K times heads was observed. (While originally it was the probability for K times heads, given the probability for heads in a single event is p).
Now we can calculate the value of p for which Q is maximum (the maximum-likelyhood estimate of p). This is most easily done by maximizing Z = log Q:
Z = K*log(p) + (N-K)*log(1-p)
taking the derivative
Z' = K/p - (N-K)/(1-p) = 0
K*(1-p) = (N-K)*p
K = N*p
p = K/N
This is what you would normally (= in a non-Bayesian approach) guess: the observed fraction of heads events is the estimate of the probability.
Now it turns out that, although with this estimate you have a maximum likelyhood that your p will be exactly right, that on the average your value for p will be wrong! K/N systematically overestimates the difference of p with 1/2. This is most clear in 100% (or 0%) scores: with K=N you get p=1. Indeed p=1 makes a 100% score maximally likely (namely a certainty). But there are many p slightly below 1 where the score is also quite likely, while by definition p can never be larger than 1. So on the average (testing all kind of coins, and passing only those that scored 100% heads) p will be below 1.
The Bayesian formula for Q(p) allows you to exactly calculate the average, as a normal expectation value;
E(p) = INTEGRAL p*Q(p) dp / INTEGRAL Q(p) dp
Doing some integration by parts (sorry about that...) then gives
E(p) = (K+1)/(N+2)
This is the same as what you get with the normal estimate for p, when you supplement your sample of coin flips by two 'virtual' flips, one heads, one tails, before calculating the heads fraction. (In Chess, where there also is a draw possibility, you could add two virtual draws to the match result.)
The Bayesian expected p (which is the basis for the awarded rating) can thus be obtained through the 'trick' of adding virtual draws before doing the naive (non-Bayesian) calculation. This trick gives you the exact result for any value of K and N, and is thus very useful. It is not completely obvious, though, if it can be generalized to having more than one match, between more than two players, while retaining these proprties of simplicity and exactness.