Winboard Forum

by **José Carlos** » 23 Dec 2004, 13:38

Hi,
I have a question for the probability experts here.
The way I use ELOStat is like this:
1. suppose I have a rating list where programs ratings are more or less stable.
2. I want to try a new program.
3. I guess it's rating belongs to positions 50-60
4. I match the program in a gaunlet fashion against those 11 programs, with white and black
5. After 22 games, the rating is below expected, and it gets position 71 in the list
6. Then I match it against programs in the 65-75 band
7. Repeat this process until the rating gets stable

The underlying idea is: try to make opponent mean rating ~ my rating.
Is this technique sound from a probabilistic point of view? Does it make more sense to match every program agains the entire pool?

by **Ulysses Omycron** » 24 Dec 2004, 00:40

Jos? Carlos wrote:Hi,
I have a question for the probability experts here.
The way I use ELOStat is like this:
1. suppose I have a rating list where programs ratings are more or less stable.
2. I want to try a new program.
3. I guess it's rating belongs to positions 50-60
4. I match the program in a gaunlet fashion against those 11 programs, with white and black
5. After 22 games, the rating is below expected, and it gets position 71 in the list
6. Then I match it against programs in the 65-75 band
7. Repeat this process until the rating gets stable

The underlying idea is: try to make opponent mean rating ~ my rating.
Is this technique sound from a probabilistic point of view? Does it make more sense to match every program agains the entire pool?

The best way to do it, it's to match it against the most stablished programs (Programs that have played the most of games). If all the programs have played the same sumber of games then it wouldn't matter (Get the starting rating and then match it against the 11 programs around it's rating), but I'd recomend making a mix of the most important programss of all groups, like making a Gauntler against Programs whose ranks are 1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101 and 111 (Or change accordingly depending the number of programs on the list), since battling against the first 11 of the list would make it underrated, battling the last 11 would make it overrated and battling around the middle (Or guessing a starting point) will overrate/underrate other programs; so your best bet is to match it against "the elite of the pool", not the strongest but the strongest of their league.

I hope I helped.

by **Rémi Coulom** » 28 Dec 2004, 23:04

Here is a new version of my program:
http://remi.coulom.free.fr/Bayesian-Elo/
I have understood and implemented the Minorization-Maximization algorithm.

So, it is possible now to apply the program to big PGN databases. I have tried it on Leo's WBEC database (WBEC1to7 + WBEC8). I edited the database slightly, so that "The CrazyBishop 0050" and "TheCrazyBishop 0050" have the same name. I also removed games by "King of Kings 1.95", because it won all of its game, which is a problem for the MM algorithm. Here is the resulting rating list:
http://remi.coulom.free.fr/Bayesian-Elo/WBEC1to8.txt

I used this database to tune the parameters of the likelihood distribution. On the webpage of my program (see above), you will find plots that show how well the model fits the data.

R?mi

by **Ulysses Omycron** » 29 Dec 2004, 14:06

I hope you have read my suggestions on the first page... They took awhile to write and... Well... If you already read them and they're not worth answering then I must just stop posting my "senseless" (Something like that said Volker).

Note: I will try BayesELO once it gets a GUI (Or instructions of how to use it), but I'm sure it's going to be good (If I sounded melodramatic somewhere, I was just joking of course 8-)

).

by **Rémi Coulom** » 29 Dec 2004, 14:29

Ulysses Omycron wrote:I hope you have read my suggestions on the first page... They took awhile to write and... Well... If you already read them and they're not worth answering then I must just stop posting my "senseless" (Something like that said Volker).

Note: I will try BayesELO once it gets a GUI (Or instructions of how to use it), but I'm sure it's going to be good (If I sounded melodramatic somewhere, I was just joking of course 8-) ).

Yes, I have read your message. Thanks for your suggestions.

It seems to be a fundamental problem of the BayesELO/ELOStat approach that players who have a very high score against weak opponents become overrated, especially when the number of games is small. An approach based on incremental updates does not have this problem. This is the reason why Leo Dijksman stopped using ELOStat for hist tournaments. ELOStat works well for one single round-robin tournament, but has big problems when two groups of players are linked to each-other with only a few games.

I am not sure I will find a satisfactory solution to this problem. I have a few ideas of things to experiment. Using non-uniform priors seems to be an interesting way. But I will have to experiment a little before I can tell.

Thanks for your interest,

R?mi

by **Tim Foden** » 29 Dec 2004, 18:39

[quote="R?mi Coulom"]Here is a new version of my program:
http://remi.coulom.free.fr/Bayesian-Elo/
I have understood and implemented the Minorization-Maximization algorithm.

I tried it on my 14MB database of 40/5 games, and it had this result:

D:\Chess\Beans>bayeselo g.pgn
version 0052, Copyright (C) 1997-2004 Remi Coulom
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under the terms and conditions of the GNU General Public License.
See http://www.gnu.org/copyleft/gpl.html for details.
Warning: unknown game result set to draw
EloRating>mm
00:00:00,00
EloRating>exactdist
00:00:06,30
EloRating>sort
EloRating>players
Rank Num. Name Elo + - games score
1 20 Aristarch 4.21 (c) 2003 Stefan Zipproth 1600 -1601 -1400 852 535.5
2 5 wcrafty-17-11 1597 -1597 -1404 1500 934
3 17 Delfi 4.1 1579 -1580 -1421 1300 766.5
4 27 Green-Light-3.00.3.4-Graz 1576 29 28 550 340.5
5 31 Green-Light-3.01.2.2 1562 31 32 400 230.5
6 6 Yace 0.99.50 1561 -1562 -1439 1500 866
7 30 Green-Light-3.01.2.1 1561 29 31 450 266
8 0 lg2000v3 1560 -1560 -1441 1552 894.5
9 15 Green-Light-2.99b-Default 1553 25 26 600 359
10 28 Green-Light-3.01.1.1 1547 28 26 550 321
11 29 Green-Light-3.01.1.2 1546 26 28 550 320
12 7 Pharaon 2.62 1535 -1535 -1466 1500 812
13 16 Green-Light-3.00-Mogens 1532 28 29 500 290.5
14 18 Green-Light-3.08-Default 1528 29 28 500 287.5
15 19 Green-Light-3.00-Ev030427 1524 30 27 499 287
16 25 Green-Light-3.00-Default 1523 28 26 550 304
17 22 Green-Light-3.30-Default 1518 27 27 550 301
18 23 Green-Light-3.31-Default 1517 28 29 500 282.5
19 12 Green-Light-2.18df-Pesce 1513 26 25 550 312.5
20 24 Green-Light-3.32-Default 1510 29 31 450 236
21 13 Green-Light-2.18df-Larson 1500 30 27 500 282.5
22 14 Green-Light-2.18df-Ochoa 1499 28 26 550 300
23 21 Green-Light-3.00-Ev030427b 1497 27 27 550 286
24 11 Green-Light-2.18-Pesce 1477 26 28 500 268
25 1 comet_B23-3 1440 -1441 -1560 1500 630.5
26 10 Green-Light-2.13-Pesce 1436 28 29 501 241
27 26 Green-Light-3.33-Default 1431 90 96 50 17
28 3 mad 1414 -1415 -1586 750 323.5
29 2 phalanxf 1402 -1402 -1599 1500 560.5
30 8 tcb 1383 -1384 -1617 1451 503.5
31 4 arasanx 1319 -1320 -1681 1500 409.5
32 9 exchess 1260 -1261 -1740 1349 283.5

The Elos look fine, but the +/- margins don't looks so good. Elostat gives this (2400 average):

Program Elo + - Games Score Av.Op. Draws

1 wcrafty-17-11 : 2498 15 16 1500 62.3 % 2411 27.6 %
2 Aristarch 4.21 (c) : 2497 19 22 852 62.9 % 2406 25.9 %
3 Delfi 4.1 : 2480 17 17 1300 59.0 % 2417 26.1 %
4 Green-Light-3.00.3.4-Graz : 2475 24 28 550 61.9 % 2391 23.5 %
5 Green-Light-3.01.2.2 : 2466 31 30 400 57.6 % 2413 24.8 %
6 Yace 0.99.50 : 2466 16 15 1500 57.7 % 2412 26.9 %
7 Green-Light-3.01.2.1 : 2465 28 29 450 59.1 % 2401 26.2 %
8 lg2000v3 : 2463 15 16 1552 57.6 % 2410 24.3 %
9 Green-Light-2.99b-Default : 2456 24 25 600 59.8 % 2386 25.3 %
10 Green-Light-3.01.1.1 : 2449 26 26 550 58.4 % 2391 25.5 %
11 Green-Light-3.01.1.2 : 2448 26 26 550 58.2 % 2391 25.1 %
12 Pharaon 2.62 : 2441 17 14 1500 54.1 % 2413 27.2 %
13 Green-Light-3.00-Mogens : 2437 27 27 500 58.1 % 2380 25.8 %
14 Green-Light-3.00-Ev030427 : 2433 27 28 499 57.5 % 2380 22.8 %
15 Green-Light-3.08-Default : 2432 27 28 500 57.5 % 2380 22.6 %
16 Green-Light-3.00-Default : 2427 27 26 550 55.3 % 2391 22.5 %
17 Green-Light-3.31-Default : 2425 28 28 500 56.5 % 2380 21.8 %
18 Green-Light-2.18df-Pesce : 2424 26 25 550 56.8 % 2376 27.1 %
19 Green-Light-3.30-Default : 2424 27 25 550 54.7 % 2391 25.1 %
20 Green-Light-3.32-Default : 2418 31 27 450 52.4 % 2401 22.2 %
21 Green-Light-2.18df-Larson : 2411 28 27 500 56.5 % 2365 22.6 %
22 Green-Light-2.18df-Ochoa : 2408 27 25 550 54.5 % 2376 24.0 %
23 Green-Light-3.00-Ev030427b : 2405 29 25 550 52.0 % 2391 21.5 %
24 Green-Light-2.18-Pesce : 2390 29 24 500 53.6 % 2365 30.0 %
25 comet_B23-3 : 2359 17 16 1500 42.0 % 2415 20.9 %
26 Green-Light-2.13-Pesce : 2352 26 30 500 48.1 % 2365 21.8 %
27 mad : 2337 22 23 750 43.1 % 2385 22.8 %
28 phalanxf : 2327 19 14 1500 37.4 % 2416 18.7 %
29 tcb : 2305 19 14 1450 34.7 % 2415 22.5 %
30 arasanx : 2249 24 13 1500 27.3 % 2419 18.1 %
31 exchess : 2188 29 12 1349 21.0 % 2418 18.8 %

If you like I'll sent you the zipped, lgc'ed version of the data base (1.2MB)

D:\Chess\Beans>dir g.*
Volume in drive D is WORK
Volume Serial Number is CCD6-F2F3

Directory of D:\Chess\Beans

29/12/2004 18:37 2,452,948 G.LGC
29/12/2004 18:38 1,285,050 g.lgc.zip
25/07/2004 08:25 14,085,266 g.pgn
29/12/2004 18:33 4,469,370 g.zip
4 File(s) 22,292,634 bytes
0 Dir(s) 2,338,598,912 bytes free

Cheers, Tim.

by **Rémi Coulom** » 29 Dec 2004, 19:20

timfoden wrote:I tried it on my 14MB database of 40/5 games, and it had this result:

Thanks Tim for reporting this problem. It is a numerical problem that arises in the estimation of likelihood distributions when the number of games is high: the likelihood is so close to zero that it becomes equal to zero because floating point numbers cannot handle such small values.

That should be easy to fix by using log-likelihoods instead, and I will do it in the next version.

R?mi

by **Rémi Coulom** » 31 Dec 2004, 15:44

Jos? Carlos wrote:The underlying idea is: try to make opponent mean rating ~ my rating.
Is this technique sound from a probabilistic point of view? Does it make more sense to match every program agains the entire pool?

You will get a reliable rating more quickly if you play against opponents of similar strength, indeed. But it is also important to play against a variety of playing styles (because the Elo assumption that the strength of a program can be captured by a single number is wrong).

With the current number of freely available programs, it is not difficult to find many programs of similar strength. I usually prefer to play against many opponents, rather than play many games against the same opponent.

R?mi

by **José Carlos** » 31 Dec 2004, 16:25

R?mi Coulom wrote:You will get a reliable rating more quickly if you play against opponents of similar strength, indeed. But it is also important to play against a variety of playing styles (because the Elo assumption that the strength of a program can be captured by a single number is wrong).

With the current number of freely available programs, it is not difficult to find many programs of similar strength. I usually prefer to play against many opponents, rather than play many games against the same opponent.

R?mi

Thanks for answering. So the idea is sound, I see.
Now let's suppose we have a certain number of games to play, this limit coming from time, computer availability, etc. So let's say we can do 100 games to get a rating for a new program.
First decision to make is: a) play against 100 programs 1 game each, b) 50 programs with white and black, c) less number of programs and more games ecah.
From your answer, a) or b) are best.
Now let's suposse we have an unlimited number of available opponents covering all strength spectrum, with rating from +infinity to zero. Let us estimate the new program's strength in 2500.
We have to decide which 100 opponents to play against.
One obvious possibility is from 2451 to 2550, to get a mean of 2500.
Other possibility: 2401 to 2450 and 2551 to 2600; this way we don't play against oponents of the same strength, but we keep the average.
We can also choose 2301 to 2350 and 2651 to 2700.
I don't think these possibilities are equivalent, considering the shape of the probability curve. Then again, I've not enough mathematics background to make any calculations.

by **Rémi Coulom** » 31 Dec 2004, 19:43

Jos? Carlos wrote:Now let's suposse we have an unlimited number of available opponents covering all strength spectrum, with rating from +infinity to zero. Let us estimate the new program's strength in 2500.
We have to decide which 100 opponents to play against.
One obvious possibility is from 2451 to 2550, to get a mean of 2500.
Other possibility: 2401 to 2450 and 2551 to 2600; this way we don't play against oponents of the same strength, but we keep the average.
We can also choose 2301 to 2350 and 2651 to 2700.
I don't think these possibilities are equivalent, considering the shape of the probability curve. Then again, I've not enough mathematics background to make any calculations.

I have made a small experiment to illustrate this. If we consider these two cases:

One win against a 1000-rated player, and one loss against a 2000-rated player
One win against a 1500-rated player, and one loss against a 1500-rated player

The figure below shows the likelihood distributions obtained for these two cases, using the Bayesian inference method of bayeselo (case 1 in green, and case 2 in red):

So, you can see that case 1 results in more uncertainty. Note that ELOStat would consider the two cases as equivalent. That's why I believe the Bayesian approach is better.

R?mi

by **José Carlos** » 01 Jan 2005, 01:52

Thanks for the explanation. It's really interesting.

by **Joachim Rang** » 01 Jan 2005, 11:41

Hi Remi,

very interesting program. I have problems using it ;-)

. Which parameters I have to specify to get a rating? When do you plan to change the treatment of unknown results (since I have lots of aborted games in my PGNs usually and they should be discarded).? Is there a readme?

regards Joachim

by **Rémi Coulom** » 02 Jan 2005, 16:00

Joachim Rang wrote:Hi Remi,

very interesting program. I have problems using it ;-). Which parameters I have to specify to get a rating? When do you plan to change the treatment of unknown results (since I have lots of aborted games in my PGNs usually and they should be discarded).? Is there a readme?

regards Joachim

Hi Joachim

Right now, there is very little documentation. I am currently making a lot of changes to my program. I will write usage documentation later, when the program has reached some stability.

If you whish to use the program to get a rating list, simply follow the example given: Open a console, cd to the directory where you have your PGN file, and run bayeselo.exe there, with the name of the PGN file on the command line. Then type "mm", "exactdist", "sort", and "players". You can redirect the output to a file with "players >file.txt".

R?mi

by **Joachim Rang** » 02 Jan 2005, 17:04

R?mi Coulom wrote:
If you whish to use the program to get a rating list, simply follow the example given: Open a console, cd to the directory where you have your PGN file, and run bayeselo.exe there, with the name of the PGN file on the command line. Then type "mm", "exactdist", "sort", and "players". You can redirect the output to a file with "players >file.txt".

R?mi

thank you for your answer. I figured it out while reading the posts in this thread. I'm looking forward to your final version of this very useful tool.

regards Joachim

by **Rémi Coulom** » 04 Jan 2005, 00:34

Hello everyone,

I have just had an exciting new idea.

A problematic situation, that was described earlier, is this one: A has played many games against B (say, 1000), so their rating difference is rather well established. C has also played many games against D (1000), so their rating difference is also well established. But the two groups {A,B} and {C,D} have played very few games against each other. For instance, the only game is a draw between B and C.

The consequence of such a situation is that, whatever the number of games every player has played, the maximum-likelihood Elo is very uncertain, and all the confidence intervals should be very wide.

ELOStat is completely helpless in such a situation. Since every player has played a lot of games, it will give narrow uncertainty margins.

Currently, bayeselo manages to correctly handle this situation when it uses the joint distribution. But, when using the "exactdist" command, that assumes that the Elos of opponents are their true Elos, it falls for the same trap as ELOStat. It is a problem, because computing the joint distribution is not practical for more than a few players (4-5), because the cost is exponential.

I think I have a solution to this problem:

Compute the maximum likelihood Elos
Compute the Hessian of the likelihood around the maximum
Assume the likelihood is Gaussian, and calculate the covariance matrix from the Hessian.
The variance of every rating can be easily calculated with the covariance matrix.

At the price of the approximation that the likelihood has the shape of a Gaussian (which does not seem to be a very false approximation), we get very good confidence intervals that have no problem with the tricky situation described above !

This solution might not look very satisfactory at first, because after having played 2001 games, we end up with very wide confidence intervals for everybody. So it seems that a lot of valuable information has been lost in the rating list by linking these two groups of independently established players by a single draw game (or a few games).

In fact, no information is lost, because the covariance matrix "knows" that the relative difference between A and B is well established. So, from the same data, it is possible to generate accurate tables of "likelihood that A is stronger than B".

I will try to implement this idea soon. Any feedback is welcome.

R?mi

by **Rémi Coulom** » 04 Jan 2005, 09:28

R?mi Coulom wrote:
The variance of every rating can be easily calculated with the covariance matrix.

Well, the covariance matrix has to be diagonalized. I do not know diagonalization algorithms well. I have just taken a look at "Numerical Recipes", and they look a little complicated. So, implementing all this risks to take some time. Good opportunity to learn something interesting, at least.

R?mi

by **Dann Corbit** » 04 Jan 2005, 20:27

R?mi Coulom wrote:
R?mi Coulom wrote:
The variance of every rating can be easily calculated with the covariance matrix.

Well, the covariance matrix has to be diagonalized. I do not know diagonalization algorithms well. I have just taken a look at "Numerical Recipes", and they look a little complicated. So, implementing all this risks to take some time. Good opportunity to learn something interesting, at least.

R?mi

Look here:
ftp://cap.connx.com/pub/chess-engines/n ... test_fpu.c

In particular, the routine rgaussi() found at the bottom of the file is very well done, and yet quite easy to read. Not at all like that bletcherous NR code.

Here is the output of the program, showing Dieter's function to be twice as fast as the others:

Results for 01/10/98 revision using TEST_FPU.C

Gauss 1000 x 2 inverts = 0.8 sec.
Accuracy of 2 computed numbers
Original = 0.487075411236915 0.402447584459975
Computed = 0.487075411236917 0.402447584459976
Avg Err. = 0.000000000000002

Crout 1000 x 2 inverts = 0.8 sec.
Accuracy of 2 computed numbers
Original = 0.487075411236915 0.402447584459975
Computed = 0.487075411236915 0.402447584459975
Avg Err. = 0.000000000000002

Dieter 1000 x 2 inverts = 0.4 sec.
Accuracy of 2 computed numbers
Original = 0.487075411236915 0.402447584459975
Computed = 0.487075411236915 0.402447584459976
Avg Err. = 0.000000000000001

by **Rémi Coulom** » 04 Jan 2005, 20:45

Dann Corbit wrote:Look here:
ftp://cap.connx.com/pub/chess-engines/n ... test_fpu.c

In particular, the routine rgaussi() found at the bottom of the file is very well done, and yet quite easy to read. Not at all like that bletcherous NR code.

Hi Dann,

Thanks, but your code is for matrix inversion. Diagonalisation is completely different.

I had to lookup "bletcherous" in my dictionary, but did not find it there. Thanks to Google:
http://www.jargon.net/jargonfile/b/bletcherous.html

Nice to learn something everyday !

R?mi

by **Dann Corbit** » 04 Jan 2005, 20:54

R?mi Coulom wrote:
Dann Corbit wrote:Look here:
ftp://cap.connx.com/pub/chess-engines/n ... test_fpu.c

In particular, the routine rgaussi() found at the bottom of the file is very well done, and yet quite easy to read. Not at all like that bletcherous NR code.

Hi Dann,

Thanks, but your code is for matrix inversion. Diagonalisation is completely different.

I had to lookup "bletcherous" in my dictionary, but did not find it there. Thanks to Google:
http://www.jargon.net/jargonfile/b/bletcherous.html

Nice to learn something everyday !

R?mi

All three routines diagonalize the matrix in order to solve it.

Nobody actually inverts a matrix, since it is a waste of time.

by **Rémi Coulom** » 04 Jan 2005, 21:19

Dann Corbit wrote:All three routines diagonalize the matrix in order to solve it.

Nobody actually inverts a matrix, since it is a waste of time.

They transform it into a diagonal form, or a product of triangular matrices in the case of the LU decomposition. But this is not what is called diagonalization. Diagonalization is about finding eigenvectors and eigenvalues. I would be extremely surprised if this can be obtained easily from the LU decomposition, or by using the Gauss pivot method. They would say it in Numerical Recipes.

Anyway, I am not so sure anymore I will need to diagonalize the matrix. I wrote this in a hurry without writing the math. The math problems I have to solve are:

How to transform the Hessian into the covariance matrix
Given the covariance matrix, how to get the variance of a single variable.

This should be rather easy to do, but I have not had the time to explore the details. Maybe I'll simply have to calculate the inverse of the Hessian after all, in which case your routines could be useful !

R?mi

Winboard Forum

ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Hehe

Re: Hehe

bayeselo: Tried it with my test DB, strange ranges.

Re: bayeselo: Tried it with my test DB, strange ranges.

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

Re: ELOStat algorithm ?

How to use it?

Re: How to use it?

Re: How to use it?

A new idea

Re: A new idea

Re: A new idea

Re: A new idea

Re: A new idea

Re: A new idea

Who is online