Winboard Forum

by **Rémi Coulom** » 29 Sep 2005, 07:40

Kirill Kryukov wrote:R?mi, I compared the ELOstat ratings produced by Bayeselo and real ELOstat ratings. You can see my table on this page. Two things to notice - the difference is often more than 1-2 points, it goes up to 10 points sometimes. And second, the uncertainty estimation is quite different, especially at the both ends of the table. Do you have guess why? ;)

[update]
I updated CEGT statistics page to include ELOstat ratings by both Bayeselo and ELOstat. They look very similar except some anomalies like -5037 for "Shredder 9 UCI" and "Fritz 8".

Thanks for pointing out this problem. It arises in case of 50% score. I remember such a problem was mentioned in the history of ELOstat. I will fix it in the next version.

The elostat command of bayeselo does not compute confidence intervals. I do not understand how ELOstat confidence intervals work. I will try to find how they work.

R?mi

by **Rémi Coulom** » 29 Sep 2005, 20:14

R?mi Coulom wrote:
Kirill Kryukov wrote:R?mi, I compared the ELOstat ratings produced by Bayeselo and real ELOstat ratings. You can see my table on this page. Two things to notice - the difference is often more than 1-2 points, it goes up to 10 points sometimes. And second, the uncertainty estimation is quite different, especially at the both ends of the table. Do you have guess why? ;)

[update]
I updated CEGT statistics page to include ELOstat ratings by both Bayeselo and ELOstat. They look very similar except some anomalies like -5037 for "Shredder 9 UCI" and "Fritz 8".

Thanks for pointing out this problem. It arises in case of 50% score. I remember such a problem was mentioned in the history of ELOstat. I will fix it in the next version.

In fact, I will not fix it. This happens because Shredder 9 UCI and Fritz 8 are not connected to others. Fixing the problem would require separating the players into connected groups before calculating Elos. I could do it, but I do not consider it worth the effort. If you wish to make sure players are connected, you can use "connect [player number]" in bayeselo, right after reading the PGN.

The elostat command of bayeselo does not compute confidence intervals. I do not understand how ELOstat confidence intervals work. I will try to find how they work.
R?mi

I have implemented an algorithm for confidence intervals that behaves similarly to that of ELOstat. The "elostat" command now also computes confidence intervals at the same time it computes ratings. So, no need to run "exactdist" after "elostat".

It is very difficult to make a completely faithful clone of ELOstat. I expect that many of the differences between ELOstat and my implementation of the ELOstat algorithm come from differences in numerical accuracy. I tried to make my implementation more accurate, but it is still different.

The new version is there:
http://remi.coulom.free.fr/Bayesian-Elo/

R?mi

by **Kirill Kryukov** » 30 Sep 2005, 05:33

Thanks for new version with intervals! I'll update my lists soon.

R?mi Coulom wrote:In fact, I will not fix it. This happens because Shredder 9 UCI and Fritz 8 are not connected to others. Fixing the problem would require separating the players into connected groups before calculating Elos. I could do it, but I do not consider it worth the effort. If you wish to make sure players are connected, you can use "connect [player number]" in bayeselo, right after reading the PGN.

Oh! I did not notice that. Yeah, no need to fix it, the proper game set must be connected.

R?mi Coulom wrote:It is very difficult to make a completely faithful clone of ELOstat. I expect that many of the differences between ELOstat and my implementation of the ELOstat algorithm come from differences in numerical accuracy. I tried to make my implementation more accurate, but it is still different.

Who knows, may be you have to make it less accurate for that.

by **Jeff Lischer** » 30 Oct 2005, 06:36

In a previous post, R?mi used the following match to show how bayeselo was better than ELOstat:

A beats B +1 =0 -0
B ties C +0 =1 -0
C beats D +1 =0 -0.

You'd expect A to be the best, B and C to be about equal, and D to be the worst. In R?mi's example, the results from bayeselo looked pretty reasonable, but the ones from ELOstat didn't make much sense. I've seen ELOstat have problems like this whenever there are players who either win or lose all their games. Based on the standard Elo equation, you'd need an infinite difference in the player's Elo ratings to explain a 100% or 0% result. Therefore, ELOstat can't solve for the Elo difference in the A vs B and the C vs D matches and it arbitrarily sets the difference to exactly 600.

Here's an example similar to R?mi's, but where nobody wins or loses all their games:

A beats B +2 =0 -1
B ties C +1 =1 -1
C beats D +2 =0 -1

ELOstat: A=+122, B=+2, C=-2, C=-122
bayeselo: A=+74, B=+11, C=-11, C=-74

Since A scored 66.7% against B, you'd expect his rating to be 120 points higher based on the standard Elo equation. Therefore, it looks like ELOstat does better than bayeselo in this new example.

For many databases, I've seen ELOstat and bayeselo give pretty similar results and it's hard to know for sure which one is better. The WBEC database seems to be a case where bayeselo is better. Are there any players in that database who scored 100% or 0%? That might explain why ELOstat has problems with it.

by **Rémi Coulom** » 30 Oct 2005, 08:45

Jeff Lischer wrote:Here's an example similar to R?mi's, but where nobody wins or loses all their games:

A beats B +2 =0 -1
B ties C +1 =1 -1
C beats D +2 =0 -1

ELOstat: A=+122, B=+2, C=-2, C=-122
bayeselo: A=+74, B=+11, C=-11, C=-74

Since A scored 66.7% against B, you'd expect his rating to be 120 points higher based on the standard Elo equation. Therefore, it looks like ELOstat does better than bayeselo in this new example.

This is because of the prior. You can tell bayeselo to behave like this if you wish by setting the prior to zero:

With the default (prior 2), I get:

Code: Select all: Rank Name Elo + - games score draws 1 A 70 0 0 3 67% 0% 2 C 11 0 0 4 63% 25% 3 B -11 0 0 4 38% 25% 4 D -70 0 0 3 33% 0%

Without prior (prior 0):

Code: Select all: Rank Name Elo + - games score draws 1 A 164 0 0 3 67% 0% 2 C 16 0 0 4 63% 25% 3 B -16 0 0 4 38% 25% 4 D -164 0 0 3 33% 0%

Note: I have not managed to replicate your figures. That may be because I have not chosen the same player colors as you have (the winner was always white in my games). But that may also be because you are using an old version of bayeselo.

Note also that the rating difference is now more than 120 points. This is in part related to the fact that bayeselo takes the player to move into consideration. Playing as white is considered a 32 point advantage (as can be seen in the difference between B and C). This is also because the maximum-likelihood method computes rating differences in a different way (much better, in my opinion) than reversing the Elo formula.

Using a prior is good. The most noticeable difference is when a player scores 100%. By reversing the Elo formula, this would mean an infinite difference in elo ratings. To avoid this problem, EloStat arbitrarily sets the rating difference to 600 points in this case. With a prior, the rating difference grows with the number of games, which sounds intuitively good (to me, at least).

Here is the rating table obtained with bayeselo (EloStat would say +600 all the time):

Code: Select all: 1-0 : +89.2563 2-0 : +150.156 3-0 : +195.835 4-0 : +232.215 5-0 : +262.379

If this does not look intuitively good to you, more serious statistical evidence indicates that ratings computed with a prior provide better predictions. This is explained in my first post about priors (Wed Sep 21, 2005 8:29 pm), where I present cross-validation data.

R?mi

by **Rémi Coulom** » 30 Oct 2005, 08:52

Jeff Lischer wrote:In a previous post, R?mi used the following match to show how bayeselo was better than ELOstat:

A beats B +1 =0 -0
B ties C +0 =1 -0
C beats D +1 =0 -0.

You'd expect A to be the best, B and C to be about equal, and D to be the worst. In R?mi's example, the results from bayeselo looked pretty reasonable, but the ones from ELOstat didn't make much sense. I've seen ELOstat have problems like this whenever there are players who either win or lose all their games. Based on the standard Elo equation, you'd need an infinite difference in the player's Elo ratings to explain a 100% or 0% result. Therefore, ELOstat can't solve for the Elo difference in the A vs B and the C vs D matches and it arbitrarily sets the difference to exactly 600.

Note that the infinite-difference problem is not the reason why EloStat has difficulties with this case. The reason for the problem is that EloStat does as if a player had played against one single opponent, whose rating is the average of ratings of his opponents. This is completely wrong, especially when the ratings of opponents are spread widely. The same phenomenon arises in the WBEC case discussed with Kirill previously in this thread, where there is no 100% score.

R?mi

by **Jeff Lischer** » 30 Oct 2005, 23:45

R?mi Coulom wrote:Note that the infinite-difference problem is not the reason why EloStat has difficulties with this case. The reason for the problem is that EloStat does as if a player had played against one single opponent, whose rating is the average of ratings of his opponents. This is completely wrong, especially when the ratings of opponents are spread widely.

Oh, now I see that you meant ELOstat wasn't predicting the B and C ratings well. My message was explaining why A's rating was so much higher than B's.

You're correct that ELOstat's method of using an average opponent rating doesn't work well in some situations. Particularly in a case like your original example, where B's and C's opponents are spread very far apart. In those cases, the average opponent approach really breaks down.

In my example, I brought the Elo difference down to only a 100 points or so and the problem with the B and C ratings being different nearly went away. That shows why ELOstat works reasonably well in most cases. If a players opponents are within 100 to 200 points, the average opponent approach doesn't introduce too much of an error.

Incidentally, here's a fairly simple method I use to eliminate the average opponent problem in ELOstat. To start off, I guess a set of Elo ratings for all players like in ELOstat. Then I sum over all the games each player played and find his total score. Next I use root solving to find the player's Elo rating that would have scored the same against that set of opponents by summing up all the expected scores using the standard Elo equation. I update that players rating and then move on to the next player. I iterate like this until a stable set of ratings is found.

There's more going on in each iteration loop, but that's compensated by the fact that fewer iteration loops are required to stabilize the solution. Since this method sums over all games played by each player, the average opponent assumption is not required. It still uses the Elo equation, however, and not the maximum-likelihood approach.

by **Rémi Coulom** » 18 Dec 2005, 21:47

I have just released a new version of bayeselo:
http://remi.coulom.free.fr/Bayesian-Elo/

Novelties are

Maximum-likelihood ratings are slightly scaled down (multiplied by a constant < 1) to match the expected-outcome curve of elostat better.
A prediction tool has been added. This tools simulates a large number of possible continuations for a running tournament, and computes expected rank, expected score, and rank distribution for every participant.

I have not checked it thoroughly, so I hope there is no big problem. Some usage documentation is available on the web page.

R?mi

by **Greg Simpson** » 19 Dec 2005, 11:21

The los function wasn't working for me last version, and I see it still doesn't work. I'm not certain when it last worked for me, but I know I used to be able to use it in some version..

Has los been disabled, or am I not understanding something? I tried using a publicly available pgn file, the WBEC version 11 premier division file. My procedure was:

Download WBEC11_Prem_Tot.zip

Extract the pgn file and rename it to t.pgn

>bayeselo
...
ResultSet>readpgn t.pgn
1104 games(s)...
ResultSet>elo
ResultSet-EloRating>mm
00:00...
ResultSet-EloRating>exactdist
00:00...
ResultSet-EloRating>ratings
<ratings>
ResultSet-EloRating>los
<blank line>
ResultSet-EloRating>los 0 10 70
<blank line>

by **Rémi Coulom** » 19 Dec 2005, 11:33

Greg Simpson wrote:The los function wasn't working for me last version, and I see it still doesn't work. I'm not certain when it last worked for me, but I know I used to be able to use it in some version..

You have to run the "covariance" command before running "los", as indicated on the web page. The reason is that computing the covariance matrix may be very slow on big databases, so you now only need to call "covariance" once, and you can then run many "los" commands very rapidly. In a future version, I will make this cache mecanism automatic so that it is not necessary to do it manually.

R?mi

by **H.G.Muller** » 19 Dec 2005, 15:26

In my experience with drawing statistical conclusions fom noisy data maximum-likelihood methods are far superior to using the data as if it were exact. Long ago, before computers were around for the layman I used a poor-man's solution to the problem of calculating ratings from a small data set without any prior knowledge of the player's strengths.

I simply replaced a result (s points from n encounters) by (s+1)/(n+2). This is the Bayesian expectation for the probability of a coin-flip, if the a-priori assumption was that any probability was equally likely. Even though this does not solve the average-opponent problem, it improved the ease with which a selfconsitent set of ratings could be determined enormously. Taking the results at face value made the rating diverge because of players that won or lost nearly all their games.

by **Rémi Coulom** » 19 Dec 2005, 18:13

H.G.Muller wrote:In my experience with drawing statistical conclusions fom noisy data maximum-likelihood methods are far superior to using the data as if it were exact. Long ago, before computers were around for the layman I used a poor-man's solution to the problem of calculating ratings from a small data set without any prior knowledge of the player's strengths.

I simply replaced a result (s points from n encounters) by (s+1)/(n+2). This is the Bayesian expectation for the probability of a coin-flip, if the a-priori assumption was that any probability was equally likely. Even though this does not solve the average-opponent problem, it improved the ease with which a selfconsitent set of ratings could be determined enormously. Taking the results at face value made the rating diverge because of players that won or lost nearly all their games.

What you describe is precisely what bayeselo does. Maybe you had understood already, but I cannot tell from your post. This is equivalent to adding two virtual draws between a pair of players. The "prior" command in bayeselo lets the user control the number of virtual draws. In case a player has had many opponents, these virtual draws are split between opponents, that is to say 2 virtual games are not added for every pair of players, but fractional virtual games when there are many opponents. When the number of games is small, this technique does indeed improve the quality of result predictions significantly.

R?mi

by **H.G.Muller** » 19 Dec 2005, 19:01

OK, I was not sure you did it exactly like that, and from the discussion I undestood that you do more than that, not treating the total of all the opponents not just as a single (normally distributed) opponent with a larger standard deviation, but really calculating the number of expected (fractional) points against each opponent individually.

At the time I was in doubt, since there was a certain arbitrariness to the choice of 2 draws. For a coin flip, which has only two outcomes, it can be easily shown that this is the proper procedure. I guess one would need some model of the game as to which fraction of the games will be a draw for a certain ELO difference, rather than just for the total matchpoints. For example, if for equal players the wins:draws:losses would in the long range be distributed as 1:2:1, a single game of chess would behave exactly as two coin flips, and one should add only a single draw. Is there a generally accepted value for this?

It never occurred to me to assign a different strength for playing white or black, though. In principle you could make this difference a player-dependent parameter, i.e. calculate a separate rating for each payer with black and white.

by **Rémi Coulom** » 19 Dec 2005, 19:46

H.G.Muller wrote:OK, I was not sure you did it exactly like that, and from the discussion I undestood that you do more than that, not treating the total of all the opponents not just as a single (normally distributed) opponent with a larger standard deviation, but really calculating the number of expected (fractional) points against each opponent individually.

I also do something like that. But adding virtual draws also helps in this case.

At the time I was in doubt, since there was a certain arbitrariness to the choice of 2 draws. For a coin flip, which has only two outcomes, it can be easily shown that this is the proper procedure. I guess one would need some model of the game as to which fraction of the games will be a draw for a certain ELO difference, rather than just for the total matchpoints. For example, if for equal players the wins:draws:losses would in the long range be distributed as 1:2:1, a single game of chess would behave exactly as two coin flips, and one should add only a single draw. Is there a generally accepted value for this?

A good method to determine a good value for the prior is this: from a varied set of game results, select a fraction at random, evaluate ratings from this fraction only, and measure the quality of outcome prediction on the other fraction. Choose the parameter that produces the best prediction.

2 is a small value for the prior. I found game databases where 3 or 4 is better, according to the previous method.

There is no absolute best way to generate ratings. Statistical inference is necessarily based on arbitrary choices. The choice of uniform distribution in the coin-flipping case is one example: one might consider that it is more likely that the probability is close to 0.5 than to 0 or 1. So a bump-like prior would be better with this assumption. Using more than 2 for the denominator of your formula would have this effect.

It never occurred to me to assign a different strength for playing white or black, though. In principle you could make this difference a player-dependent parameter, i.e. calculate a separate rating for each payer with black and white.

I do not expect the difference would be big accross players.

by **H.G.Muller** » 19 Dec 2005, 21:30

Interesting that you seem to need a larger prior. I had expected exactly the opposite (because of the possibility to draw). Like you say, this is equivalent to the a-priori assumption that the player are about equally strong (i.e. the coin is fair). So it might depend on the case at hand if this asumption is justified (e.g. because there was a pre-selection electing the particular set of participants for the tournament.)

by **Josh Menke** » 02 May 2006, 19:38

Hi, I'm new here, but not new to Bayesian ranking systems. I'm working on one for objective-based team first-person shooters. Maybe not super-popular among chess-players, but I borrowed a lot of ideas from ELO's system.

For a little background on myself, I'm finishing a PhD in Computer Science with an emphasis in Machine Learning, much like Remi. A large part of my dissertation is heading into the stats domain, however.

You can see an example of some of my results here:
http://two.bpark3.com/themes/bismarck/xpsave_table.php

The problem is a bit more complicated than chess because I am trying to extract individual player ratings from group competitions. Huang et al. did this with a generalized version of the MM algorithm in Generalized Bradley-Terry Models and Multi-Class Probability Estimates, (http://www.jmlr.org/papers/volume7/huan ... ang06a.pdf), but I need an online updating version for my situation.

I haven't yet added a formal measure of uncertainty either to my production rating system, still working on that.

I had some questions for Remi mostly.

Have you tried using, say, a Gaussian prior for ratings instead of a uniform one? I believe that's what Graves, Reese, and Fitzgerald used in their hierarchical bayesian NASCAR paper (modeling permutations). It's what I've been using. Their model is also a Bradley-Terry one, so you'll probably find their recent JASA paper helpful:

http://madison.byu.edu/papers/racing4.ps

I'm working a little with Dr. Shane Reese (http://madison.byu.edu) on my current model. They used MCMC, being more academically oriented. MCMC would take too long for your and my application.

I see you used the MM algorithm, meaning you don't have a set of updating equations a la Elo's original formulae.

Have you considered trying a form of stochastic gradient descent instead? Similar to Elos original formulae, but derived from your Bayesian model and including draws and color. I am working on a form of this based on Spall's Adaptive Stochastic Approximation method. I'm using the one that approximates the first derivative and the Hessian after each instance (match). This way you can include all of the same information, but have an updating approach instead of fitting globally at each round. This also allows you dynamically estimate the parameters.

Here is a link to Spall's paper (2000): http://www.jhuapl.edu/SPSA/PDF-SPSA/Spall_TAC00.pdf

There's also Expectation Propagation, which Microsoft Research has been using in their "TrueSkill" ranking system. They basically assume normal and the EP algorithm combines adaptive density filtering and something akin to an extended Kahlman filter to minimize the KL-divergence between their normal and the true distribution. Pretty fancy stuff, mostly designed for very VERY large data sets, like the Xbox Live audience, etc.

http://research.microsoft.com/mlp/trues ... tails.aspx

Just thought I'd pipe in a little since I'm working on something related and found this nice thread and your MM code.

Thanks.

by **Rémi Coulom** » 02 May 2006, 21:50

josh wrote:The problem is a bit more complicated than chess because I am trying to extract individual player ratings from group competitions. Huang et al. did this with a generalized version of the MM algorithm in Generalized Bradley-Terry Models and Multi-Class Probability Estimates, (http://www.jmlr.org/papers/volume7/huan ... ang06a.pdf), but I need an online updating version for my situation.

Hi Josh

If I understand Huang et al's paper correctly, when a team is made of players 1, 2, 3, and the gamma's of these players (in the sense of gammas in Hunter's paper) are gamma_1, gamma_2, gamma3, then the strength of the team is assumed to be gamma_1 + gamma_2 + gamma_3. This does no seem good. gamma_1 * gamma_2 * gamma_3 should be the right solution (multiplying gammas means adding Elo points). Using the product of gammas is not a problem, since the model can still be minorized-maximized.

I am not sure what you mean by "online". Minorization-Maximization is so efficient, that it could update ratings extremely rapidly after an additional match result is incorporated into the data.

Have you tried using, say, a Gaussian prior for ratings instead of a uniform one? I believe that's what Graves, Reese, and Fitzgerald used in their hierarchical bayesian NASCAR paper (modeling permutations). It's what I've been using. Their model is also a Bradley-Terry one, so you'll probably find their recent JASA paper helpful:

http://madison.byu.edu/papers/racing4.ps

I do not use an uniform prior in Bayeselo. I don't use a Gaussian either. A Gaussian prior is not very convenient, because it makes it more complicated to find the maximum likelihood. Minorization-maximization does not work with a Gaussian prior. So I use prior draws instead. They fit perfectly into the model. The prior they provide looks like a bump, but it is not a Gaussian (it has longer tails). It is very important to use a prior.

I'm working a little with Dr. Shane Reese (http://madison.byu.edu) on my current model. They used MCMC, being more academically oriented. MCMC would take too long for your and my application.

"Academically-oriented" people do not necessary prefer inefficient methods all the time. Isn't Minorization-Maximization academic ? Minorization-Maximization is great, really.

I see you used the MM algorithm, meaning you don't have a set of updating equations a la Elo's original formulae.

Have you considered trying a form of stochastic gradient descent instead? Similar to Elos original formulae, but derived from your Bayesian model and including draws and color. I am working on a form of this based on Spall's Adaptive Stochastic Approximation method. I'm using the one that approximates the first derivative and the Hessian after each instance (match). This way you can include all of the same information, but have an updating approach instead of fitting globally at each round. This also allows you dynamically estimate the parameters.

I don't understand why you seem to consider MM so different from stochastic gradient descent. MM is a form of gradient descent. It is an updating approach. The only difference I see between MM and gradient descent is that MM is tremendously more efficient than traditional generic forms of gradient descent, as Hunter shows in his paper.

Maybe what you mean is that bayeselo does not deal with players of time-varying strength. Bayeselo was indeed designed for the evaluation of computer-programs, so time-varying strength was not an issue that I considered.

But the question of MM vs stochastic gradient descent has nothing to do with that. It is possible to design statistical models that deal with players of time-varying strength, and that can be solved with the MM algorithm. In fact, I am working on one right now.

There's also Expectation Propagation, which Microsoft Research has been using in their "TrueSkill" ranking system. They basically assume normal and the EP algorithm combines adaptive density filtering and something akin to an extended Kahlman filter to minimize the KL-divergence between their normal and the true distribution. Pretty fancy stuff, mostly designed for very VERY large data sets, like the Xbox Live audience, etc.

http://research.microsoft.com/mlp/trues ... tails.aspx

I am sure a similar approach can be based on Minorization-Maximization, with the Luce model to deal with multiplayer games. As I said previously, I am currently working on something along these lines. I'll give more information as soon as I have results to show.

Just thought I'd pipe in a little since I'm working on something related and found this nice thread and your MM code.

Thanks.

Thanks for the interesting discussion.

R?mi

by **Josh Menke** » 02 May 2006, 22:48

R?mi Coulom wrote:If I understand Huang et al's paper correctly, when a team is made of players 1, 2, 3, and the gamma's of these players (in the sense of gammas in Hunter's paper) are gamma_1, gamma_2, gamma3, then the strength of the team is assumed to be gamma_1 + gamma_2 + gamma_3. This does no[t] seem good. gamma_1 * gamma_2 * gamma_3 should be the right solution (multiplying gammas means adding Elo points). Using the product of gammas is not a problem, since the model can still be minorized-maximized.

Well, this is a model detail and is different for them than you. ELO scores are on the exponential scale, whereas Huang et al.'s are not. They don't exponentiate the difference between scores. It's simply \frac{\lambda_i}{\
lambda_i + \lambda_j} where the \lambdas are the Bradley-Terry strengths of individuals (or groups) i and j. Since they don't exponentiate their scores, they add instead of multiply. Multiplying exponentiated scores is the same as adding the exponents, etc.

R?mi Coulom wrote:I am not sure what you mean by "online".

I mean it in, at least for me, the neural network training sense where you have online vs. batch training. In online training, you update the parameters after every instance (match in the game sense) according to the gradient of the error (or likelihood) with respect to to the parameters. In batch training, the gradient is instead the sum of all the instantaneous gradients taken for each pattern.

Online training is what allows you to approximate time-varying player strengths. In neural network training, it allows you to more efficiently calculate the gradient leading to faster training (see Wilson, D. Randall and Martinez, Tony R. The general inefficiency of batch training for gradient descent learning. Neural Networks, volume 16 (10), pages 1429--1451, Elsevier Science Ltd. Oxford, UK, UK, 2003. available here: http://axon.cs.byu.edu/detailed_publica ... nn03.batch).

Minorization-Maximization is so efficient, that it could update ratings extremely rapidly after an additional match result is incorporated into the data.

Yes, I can see it is a very efficient algorithm. Kind of a poor-man's Gibb's sampler that gives you just the posterior mode. So what you're saying is you would add the most recent match to the current data and then re-run the entire MM algorithm again. There are 2 problems I have here. 1) As discussed by both of us below, it doesn't take into account time-varying player strengths, and 2) it requires storing information about past matches, something I can't do in my production environment. I can do it for research, but can't use it in the real life situtation I have. At least, I don't think I can. You're welcome to prove me wrong.

I'm not sure what exactly I would need to store for an MM approach in my situation. I have two teams competing and they are almost NEVER composed of the same players. The players mix between the teams, etc. I need to construct the team strengths from the player strengths, and then in turn infer the player strengths from team play. The "number of times Team A beat Team B" is always 0 or 1, never > 1. Hmm. I'll have to think about this. If it's obvious to you, I'm listening.

I do not use an uniform prior in Bayeselo. I don't use a Gaussian either. A Gaussian prior is not very convenient, because it makes it more complicated to find the maximum likelihood. Minorization-maximization does not work with a Gaussian prior. So I use prior draws instead. They fit perfectly into the model. The prior they provide looks like a bump, but it is not a Gaussian (it has longer tails). It is very important to use a prior.

I don't see how using a Gaussian prior will hurt MM, at least not from Hunter's paper. Maybe I missed it. Seems like it just slightly changes the log-likelihood function. At least, that's what it did for my current approach.

It seems to me you're essentially using a beta distribution here.

"Academically-oriented" people do not necessary prefer inefficient methods all the time. Isn't Minorization-Maximization academic ? Minorization-Maximization is great, really.

Ah, no, I know for sure that efficient approximations of a posterior mode are definitely academic research. I just meant their paper doesn't discuss any because, at least according to Shane, JASA cares more about the model and the full posterior than about approximations to it. A paper discussing approximations would go somewhere else (Applied Statistics?). Unless it's like Hunter's paper which unifies the theory and / or a paper that presents a novel approximation approach. Plus, your typical stats Bayesian always wants the full posterior, although I'm guessing your average CS guy assumes this isn't practical.

Although, I did run MCMC on several subsets of my own data involving around 500 players and 500 matches each. I got some nicely formed posteriors, which you don't get when just approximating the mode, etc. Well, unless you assume some distributionapproximation like using the posterior mode as as the mean and the negative inverse of the Hessian as the variance of a multivariate normal, etc. I really like MCMC for that, although I know I could never run it real time. MCMC took several hours per set.

I don't understand why you seem to consider MM so different from stochastic gradient descent. MM is a form of gradient descent. It is an updating approach. The only difference I see between MM and gradient descent is that MM is tremendously more efficient than traditional generic forms of gradient descent, as Hunter shows in his paper.

Hmm, by updating I mean after each match you apply a single update to each parameter (player strengths, means, etc.), instead of rerunning the entire algorithm on all of the data.

I mean an algorithm that doesn't need to track any history but the last match. For my application, I can't store that information.

Maybe what you mean is that bayeselo does not deal with players of time-varying strength. Bayeselo was indeed designed for the evaluation of computer-programs, so time-varying strength was not an issue that I considered.

But the question of MM vs stochastic gradient descent has nothing to do with that. It is possible to design statistical models that deal with players of time-varying strength, and that can be solved with the MM algorithm. In fact, I am working on one right now.

Yes, if you add time to your original model such that a more recent game carries a higher weight. The nice thing about methods like Elo's original approach is that you get that for free. Every time you update a player's rating, you are essentially weighting their old rating by some amount and their current performance by another, and combining them. This allows you to naturally track increasing (or decreasing) performance.

I am sure a similar approach can be based on Minorization-Maximization, with the Luce model to deal with multiplayer games. As I said previously, I am currently working on something along these lines. I'll give more information as soon as I have results to show.

Again, with expectation propagation you have a very fast, memory-less, and naturally time-varying algorithm. The nice thing with TrueSkill is the variance is estimated explicitly instead of calculated afterwards.

Thanks for your response. I've been banging my head against this for a bit now. If I can get MM to work for me, I'll be happy. For now I'm going to try Spall's method.

I'd really like to get variance out of it as well. I guess you can evaluate the Hessian at the mode you get out of MM to do this.

Thanks again.

by **Josh Menke** » 02 May 2006, 23:08

R?mi Coulom wrote:I don't understand why you seem to consider MM so different from stochastic gradient descent. MM is a form of gradient descent. It is an updating approach. The only difference I see between MM and gradient descent is that MM is tremendously more efficient than traditional generic forms of gradient descent, as Hunter shows in his paper.

Just a quick comment here. Hunter shows an example where a standard deterministic Newton-Raphson method is slower than MM. This is largely because the entire Hessian is calculated at each step.

The stochastic Newton-Raphson that Spall proposes is on the same order as MM---just as efficient.

by **Rémi Coulom** » 03 May 2006, 09:44

josh wrote:Well, this is a model detail and is different for them than you. ELO scores are on the exponential scale, whereas Huang et al.'s are not. They don't exponentiate the difference between scores. It's simply \frac{\lambda_i}{\
lambda_i + \lambda_j} where the \lambdas are the Bradley-Terry strengths of individuals (or groups) i and j. Since they don't exponentiate their scores, they add instead of multiply. Multiplying exponentiated scores is the same as adding the exponents, etc.

Elos are the logarithms of gammas, not the reverse. The gammas are on a n exponential scale, and should be multiplied. This works like the home-field advantage in Hunter's paper: it is multiplied by the strength of the player, not added to it.

I'm not sure what exactly I would need to store for an MM approach in my situation. I have two teams competing and they are almost NEVER composed of the same players. The players mix between the teams, etc. I need to construct the team strengths from the player strengths, and then in turn infer the player strengths from team play. The "number of times Team A beat Team B" is always 0 or 1, never > 1. Hmm. I'll have to think about this. If it's obvious to you, I'm listening.

In the model I have in mind, the gamma of a team is the product of the gamma of its member. MM allows to find the maximum-likelihood gamma for every individual player. It is not a problem at all if a particular team plays very few matches.

I don't see how using a Gaussian prior will hurt MM, at least not from Hunter's paper. Maybe I missed it. Seems like it just slightly changes the log-likelihood function. At least, that's what it did for my current approach.

It adds a quadratic term to the log likelihood. So the maximization phase of MM has to solve a second-order polynomial. This means you'll have to compute a square root at each iteration, instead of a simple division.

Again, with expectation propagation you have a very fast, memory-less, and naturally time-varying algorithm. The nice thing with TrueSkill is the variance is estimated explicitly instead of calculated afterwards.

The fundamental flaw of this approach is that it is purely forward in time. Models that keep the history of games are better. Some have been used in practice on large-scale game servers (the KGS game server, as well as IGS, use such a method, as far as I know).

R?mi

Winboard Forum

ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

new version with prediction of round-robin tournaments.

los?

Re: los?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Who is online