Moderator: Andres Valverde
Changes from version 1.2 to 1.3:
- maximum number of different players/programs increased to 1500
- algorithm for calculating the confidence intervals completely changed (now uses the so called nonparametric ABC method (approximated bootstrap confidence)) by Efron and Tibshirani. Many thanks to Dr. Jeff Lischer (US) for drawing my attention to this fantastic method and to all users who pointed out the insufficiencies of the old method.
- some minor bugs in individual statistics output removed
Absolutely,R?mi Coulom wrote:I believe the Bayesian approach is still much better.
R?mi
This was discussed earlier in this thread. There is a CSS paper explaining the past versions of ELOStat. Dieter gave a summary (first page, dated Thu Dec 16, 2004 2:43 pm, I do not know how to link to a message).Robert Allgeuer wrote:A question to the statistics experts:
If I have understood correctly EloStat makes various assumptions and simplifications about the distribution of the error for the calculation of the error margins.
What are these assumptions?
One is that it assumes that error margins against the whole set of opponents is the same as against one opponent with the average rating.
But what kind of distribution function does it assume? Any other assumptions?
Also, is there a paper on its method available somewhere?
Thanks
Robert
Rank Name Elo + - games score ratio
1 Jonny 2.70 2654 103 90 56 45 80%
2 Spike 0.8a 2577 94 86 56 40 71%
3 DanChess 1.07 2576 92 86 56 39 70%
4 TRACE 1.33 2482 87 85 52 30 58%
5 E.T.Chess 071204 2471 84 80 56 37 66%
6 Pseudo 0.6h 2453 84 82 56 32 57%
7 Petir 2.0 2443 88 86 52 29.5 57%
8 Snitch 1.0.8 2416 82 81 56 29.5 53%
9 Muse 0.899b uci 2408 83 81 55 30.5 55%
10 TheCrazyBishop 0052 2408 83 83 56 29 52%
11 Terra 3.3b11 2402 87 87 56 27 48%
12 Leila 0.53h 2396 83 83 56 27 48%
13 Cerebro 1.30 2391 92 93 52 25 48%
14 Frenzee 199 2386 86 87 56 26 46%
15 Amateur 2.80 2375 81 83 56 24.5 44%
16 Scidlet 3.6 2359 84 86 52 22.5 43%
17 Bruja 1.8 2356 83 85 52 22.5 43%
18 Queen 2.45 2338 85 86 52 23 44%
19 Chezzz 1.0.3 2329 85 89 56 21.5 38%
20 Resp 0.19 2307 87 91 52 20 38%
21 Knightx 1.86 2301 84 87 56 22 39%
22 Djinn 0.870f 2281 84 88 56 21 38%
23 Butcher 1.53 2263 87 94 55 16.5 30%
24 Esc 1.16 2224 93 102 52 15 29%
Program Elo + - Games Score Av.Op. Draws
1 Jonny 2.70 : 2628 66 124 56 80.4 % 2383 21.4 %
2 Spike 0.8a : 2551 72 105 56 71.4 % 2391 21.4 %
3 DanChess 1.07 : 2549 74 102 56 69.6 % 2405 21.4 %
4 E.T.Chess 071204 : 2473 77 82 56 66.1 % 2358 32.1 %
5 TRACE 1.33 : 2469 90 82 52 57.7 % 2415 26.9 %
6 Pseudo 0.6h : 2440 87 80 56 57.1 % 2390 25.0 %
7 Petir 2.0 : 2434 91 87 52 56.7 % 2387 21.2 %
8 Snitch 1.0.8 : 2415 92 71 56 52.7 % 2396 30.4 %
9 Muse 0.899b uci : 2407 90 74 55 55.5 % 2369 30.9 %
10 TheCrazyBishop 0052 : 2402 94 75 56 51.8 % 2390 25.0 %
11 Terra 3.3b11 : 2396 81 94 56 48.2 % 2408 17.9 %
12 Cerebro 1.30 : 2394 91 97 52 48.1 % 2407 11.5 %
13 Leila 0.53h : 2393 69 94 56 48.2 % 2405 32.1 %
14 Frenzee 199 : 2390 83 91 56 46.4 % 2414 17.9 %
15 Amateur 2.80 : 2372 74 88 56 43.8 % 2415 30.4 %
16 Bruja 1.8 : 2364 75 91 52 43.3 % 2411 32.7 %
17 Scidlet 3.6 : 2363 79 91 52 43.3 % 2410 28.8 %
18 Queen 2.45 : 2342 80 92 52 44.2 % 2382 26.9 %
19 Chezzz 1.0.3 : 2332 91 82 56 38.4 % 2415 19.6 %
20 Resp 0.19 : 2320 91 85 52 38.5 % 2402 23.1 %
21 Knightx 1.86 : 2318 88 83 56 39.3 % 2394 21.4 %
22 Djinn 0.870f : 2297 91 81 56 37.5 % 2385 21.4 %
23 Butcher 1.53 : 2280 105 74 55 30.0 % 2427 20.0 %
24 Esc 1.16 : 2252 125 76 52 28.8 % 2409 11.5 %
1 Jonny 2.70 : 2628 98 93 56 80.4 % 2383 21.4 %
2 Spike 0.8a : 2551 90 87 56 71.4 % 2391 21.4 %
3 DanChess 1.07 : 2549 89 86 56 69.6 % 2405 21.4 %
4 E.T.Chess 071204 : 2473 79 77 56 66.1 % 2358 32.1 %
5 TRACE 1.33 : 2469 83 82 52 57.7 % 2415 26.9 %
6 Pseudo 0.6h : 2440 81 80 56 57.1 % 2390 25.0 %
7 Petir 2.0 : 2434 87 86 52 56.7 % 2387 21.2 %
8 Snitch 1.0.8 : 2415 77 77 56 52.7 % 2396 30.4 %
9 Muse 0.899b uci : 2407 78 78 55 55.5 % 2369 30.9 %
10 TheCrazyBishop 0052 : 2402 80 80 56 51.8 % 2390 25.0 %
11 Terra 3.3b11 : 2396 84 84 56 48.2 % 2408 17.9 %
12 Cerebro 1.30 : 2394 91 91 52 48.1 % 2407 11.5 %
13 Leila 0.53h : 2393 76 76 56 48.2 % 2405 32.1 %
14 Frenzee 199 : 2390 84 85 56 46.4 % 2414 17.9 %
15 Amateur 2.80 : 2372 77 78 56 43.8 % 2415 30.4 %
16 Bruja 1.8 : 2364 79 80 52 43.3 % 2411 32.7 %
17 Scidlet 3.6 : 2363 81 82 52 43.3 % 2410 28.8 %
18 Queen 2.45 : 2342 82 83 52 44.2 % 2382 26.9 %
19 Chezzz 1.0.3 : 2332 84 86 56 38.4 % 2415 19.6 %
20 Resp 0.19 : 2320 85 87 52 38.5 % 2402 23.1 %
21 Knightx 1.86 : 2318 83 84 56 39.3 % 2394 21.4 %
22 Djinn 0.870f : 2297 83 85 56 37.5 % 2385 21.4 %
23 Butcher 1.53 : 2280 88 91 55 30.0 % 2427 20.0 %
24 Esc 1.16 : 2252 97 102 52 28.8 % 2409 11.5 %
Rank Name Elo + - games score draws
1 A 146 267 267 32 75% 25%
2 C 16 253 253 33 74% 27%
3 B -16 253 253 33 25% 27%
4 D -146 267 267 32 25% 25%
A C B D
A 689 996 861
C 310 551 996
B 3 448 689
D 138 3 310
Program Elo + - Games Score Av.Op. Draws
1 A : 210 121 114 32 75.0 % 19 25.0 %
2 B : 19 109 115 33 25.8 % 203 27.3 %
3 C : -19 115 109 33 74.2 % -203 27.3 %
4 D : -210 114 121 32 25.0 % -19 25.0 %
Dann Corbit wrote:You can find an open source version of erfc() in the Cephes collection by Moshier at Netlib.
Dann Corbit wrote:I am sure that I will enjoy looking at what you did very much.
I am hoping that eventually, I will even understand it!
;-)
R?mi Coulom wrote:Right now, my plans are:
- Implement Metropolis-Hastings. This will provide a very accurate matrix of "likelihood that i is stronger than j". This will also allow to compute the expected ratings, instead of the maximum-likelihood ratings, and make Peter happy.
R?mi Coulom wrote:R?mi Coulom wrote:Right now, my plans are:
- Implement Metropolis-Hastings. This will provide a very accurate matrix of "likelihood that i is stronger than j". This will also allow to compute the expected ratings, instead of the maximum-likelihood ratings, and make Peter happy.
I have spent this afternoon implementing this, just to find out that it does not work. Well, it works for a small number of players, but not for a large number. The cost is still exponential with the number of players, so it is not practical for more than 4-5. I naively thought that using the Gaussian as a helper would be the magic trick that allows good high-dimensional sampling. I could have saved an afternoon if I had thought about this a little more. I am feeling a little stupid now.
Thanks Dann for the erfc code. I will probably not use it because I am happy with gcc, but I may find a use for it, sometime.
R?mi
ResultSet-EloRating>prior 0.1
0.1
With this prior, you will get the following Elo differences:
1-0 : +475.016
2-0 : +590.27
3-0 : +658.945
4-0 : +708.031
5-0 : +746.258
ResultSet-EloRating>prior 1
1
With this prior, you will get the following Elo differences:
1-0 : +150.156
2-0 : +232.215
3-0 : +288.12
4-0 : +330.435
5-0 : +364.46
ResultSet-EloRating>prior 2
2
With this prior, you will get the following Elo differences:
1-0 : +89.2563
2-0 : +150.156
3-0 : +195.835
4-0 : +232.215
5-0 : +262.379
R?mi Coulom wrote:{snip}
I am very convinced that this new version produces much better Elo ratings, especially when the number of games is small. You can get it there:
http://remi.coulom.free.fr/Bayesian-Elo/
R?mi
Return to Programming and Technical Discussions
Users browsing this forum: No registered users and 38 guests