ELOStat algorithm ?

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Re: ELOStat algorithm ?

Postby Dann Corbit » 22 Sep 2005, 01:52

Kirill Kryukov wrote:Great! Thanks, R?mi!

Is there a chance future bayeselo may run in fully un-interactive mode? I like to run it from another program to automatically process PGN collection from my tournaments. So it would be very nice if you could run "bayeselo -readpgn ... -elo -mm -exactdist -ratings" (whatever we have to type now) and it will just print its output to stdout?


Oooh! Oooh! Meee Tooo!!!
{that would be a nice feature}
Dann Corbit
 

Re: Bayeselo now with a proper prior

Postby Rémi Coulom » 22 Sep 2005, 08:30

Dann Corbit wrote:Can I get a source tarball?

I have just updated my web page with a new version of bayeselo that fixes a minor bug that could cause a segmentation fault. Also I have put a link to the new source.
http://remi.coulom.free.fr/Bayesian-Elo/

R?mi
Last edited by Rémi Coulom on 22 Sep 2005, 08:39, edited 1 time in total.
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Rémi Coulom » 22 Sep 2005, 08:38

Dann Corbit wrote:
Kirill Kryukov wrote:Great! Thanks, R?mi!

Is there a chance future bayeselo may run in fully un-interactive mode? I like to run it from another program to automatically process PGN collection from my tournaments. So it would be very nice if you could run "bayeselo -readpgn ... -elo -mm -exactdist -ratings" (whatever we have to type now) and it will just print its output to stdout?


Oooh! Oooh! Meee Tooo!!!
{that would be a nice feature}


I will think about adding this feature. But I will probably not have time to do it rapidly.

If you wish to run bayeselo from another program, you can send commands through a pipe like this:
Code: Select all
echo "readpgn whatever.pgn
elo
mm
exactdist
ratings >ratings.dat
" | bayeselo


You could also create a file called "script" that contains your commands and do
Code: Select all
bayeselo<script


R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Kirill Kryukov » 22 Sep 2005, 10:21

If you wish to run bayeselo from another program, you can send commands through a pipe like this:

*snip*

Yeah, this is approximately what I'm doing. I am calling bayeselo from perl script using open2(), to do without having any extra files. Just it would be more nice to have bayeselo as a proper command line app... :)

Hmm. One question.. I know WBEC is good, but are you planning to test Bayeselo's prediction on some other tourneys? I mean perhaps you can tune it for particular tournament, but will it be as good for another one? Particularly with different draw rate (may be because of shorter time control, or something)

I use Bayeselo for my engine tourney. It's still not ready for discussion, but the rating table is there... For now I included both old and new Bayeselo ratings. From what I see it's just a bit compressed comparing to the old one, but I trust you are doing your best science to make it right. :)
User avatar
Kirill Kryukov
 
Posts: 127
Joined: 21 Sep 2005, 09:56

Re: ELOStat algorithm ?

Postby Dann Corbit » 22 Sep 2005, 18:52

Kirill Kryukov wrote:
{snip}
I use Bayeselo for my engine tourney. It's still not ready for discussion, but the rating table is there... For now I included both old and new Bayeselo ratings. From what I see it's just a bit compressed comparing to the old one, but I trust you are doing your best science to make it right. :)


You have a very interesting web page.
Thanks for your interesting efforts
Dann Corbit
 

Re: ELOStat algorithm ?

Postby Dann Corbit » 22 Sep 2005, 19:30

Here is a MS VC++ build:
http://cap.connx.com/chess-engines/new- ... yeselo.exe

Here is the source and projects that I used:
http://cap.connx.com/chess-engines/new- ... yeselo.zip

I added tag names to the enums that were missing tag names to make the Doxygen documentation more understandable to me. I also added the Cephes version of the erf() stuff so it will compile under Windows.

It is a manificent collection of chess parsing stuff, besides being an Elo calculation engine.
Dann Corbit
 

Re: ELOStat algorithm ?

Postby Rémi Coulom » 22 Sep 2005, 20:07

Kirill Kryukov wrote:From what I see it's just a bit compressed comparing to the old one, but I trust you are doing your best science to make it right. :)


If you feel it is not compressed enough, you can make it more compressed by increasing the prior.

I still have to run more experiments to precisely determine whether bayeselo predictions are better than those of ELOStat. I am not yet 100% certain that they are better.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Salvo Spitaleri » 23 Sep 2005, 19:16

Hi R?mi,

what about the meanelo command?


Ciao
Salvo
Salvo Spitaleri
 
Posts: 10
Joined: 04 Oct 2004, 18:46
Location: Italy

Re: ELOStat algorithm ?

Postby Rémi Coulom » 23 Sep 2005, 21:19

Salvo Spitaleri wrote:Hi R?mi,

what about the meanelo command?


Ciao
Salvo
I suppose you are referring to something old. I am afraid I do not remember. I do not understand your question.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Kirill Kryukov » 24 Sep 2005, 03:50

Thanks, Dann. Those efforst give me more questions than answers at the moment...

R?mi Coulom wrote:If you feel it is not compressed enough, you can make it more compressed by increasing the prior.

I have no idea how do I tell if it's compressed enough or not. But if you like to test something you can download my games. (Although there are still not so many of them at this point).

I got curious about one thing. As I understand, BayesELO is tuned for WBEC currently. As far as I know, WBEC is a very sparse tournament. It has big roundrobins for leagues, and leagues are connected by smaller promotion roundrobins. Do you think that tuning BayesELO to such sparse data will also give good ratings for more concentrated tournament, like a single, but big, roundrobin?

Another thing.. For a single roundrobin BayesELO and ELOstat are not too much different. But for a sparse data like WBEC they give very different ratings. I tried it for WBEC games, and the ratings are sometimes very different.. (One table is here, just hit esc after the rating table is opened, to not load the huge pairwise tables below that). So I wonder, how they can be so different? I can understand a few percent difference, but 158 vs -342 (Kiwi 0.5a), or 283 vs -132 (Delphil 1.5b) is quite large...
User avatar
Kirill Kryukov
 
Posts: 127
Joined: 21 Sep 2005, 09:56

Re: ELOStat algorithm ?

Postby Salvo Spitaleri » 24 Sep 2005, 09:49

Code: Select all
Rank Name               BayesElo   BayesElo     BayesElo  Chessbase Elostat
                        v0052.12   v0052.05     v0052.12             v1.3
                       
                        prior 2.6  meanelo 2572 prior 2.6                     +    -  games score draws
                                                +2574 Elo
 1  Hiarcs 10               570       3239       3144       3066     3097    167  160   10   60%   60%
 2  Zappa 2.0               517       3175       3091       3001     3027    134  127   24   68%   29%
 3  Hydra                   448       3105       3022       3044     3045    270  195   12   87%    8%
 4  Shredder 9-64           323       2900       2897       2869     2869    98   86    70   81%   11%
 5  Fruit WCCC'05           280       2855       2854       2845     2846    20   19   1169  73%   23%
 6  Shredder 9              251       2825       2825       2808     2810     9    9   6351  67%   23%
 7  Toga II 1.0             235       2809       2809       2797     2798    20   20   983   62%   25%
 8  Fruit 2.1               214       2790       2788       2782     2783    13   13   2481  63%   27%
 9  Shredder 8              213       2788       2787       2778     2779    13   14   2209  62%   29%
10  Shredder 7.04           206       2780       2780       2770     2772    11   11   3129  59%   31%
11  Thinker 5.0b-64         204       2785       2778       2772     2773    186  181   13   53%    0%
12  Deep Sjeng X2-64        202       2801       2776       2719     2781    163  147   18   69%   16%
13  Shredder 8 Gambit       189       2765       2763       2757     2759    33   33   342   56%   30%
14  Junior 9                185       2760       2759       2751     2752    11   11   3623  58%   26%
15  Scorpio 1.6             178       2779       2752         0      2755    215  199   9    61%   11%
16  Fritz 8                 176       2750       2750       2742     2743     8    7   7729  55%   28%
18  Shredder 7              162       2737       2736       2730     2732    19   19   1024  55%   31%
17  Toga II 0.93            162       2736       2736       2726     2727    29   29   497   56%   18%
19  Crafty 19.20            155       2734       2729         0      2722    219  221   8    50%   25%
20  Zappa 1.1-64            153       2763       2727         0      2732    228  208   8    62%    0%
21  Junior 8                147       2723       2721       2715     2717    12   12   2670  53%   29%
22  Chess Tiger 2004        140       2714       2714       2709     2711    24   24   628   53%   34%
23  Hiarcs 9                139       2714       2713       2710     2711    10    9   4404  52%   30%
24  Fritz 7                 138       2713       2712       2708     2710    13   13   2356  56%   32%
25  Hiarcs 8 Bareev         133       2707       2707       2703     2706    27   27   526   46%   30%
26  Gandalf 6               131       2705       2705       2701     2702    10   10   4289  53%   26%
27  The King 3.33           131       2705       2705       2702     2703     9    9   5059  51%   25%
28  Patriot 2.0             128       2705       2702       2700     2701    33   32   403   51%   17%
29  Spike 1.0a Mainz        128       2702       2702       2703     2704    23   23   784   50%   23%
30  Ktulu 7.0               127       2702       2701       2700     2700    13   13   2466  55%   24%
31  Chepla 0.64             126       2823       2700         0      2816    190  156   15   83%    6%
32  Ruffian Leiden          125       2699       2699       2699     2699    19   19   1060  54%   28%
33  Pro Deo 1.1             120       2695       2694       2693     2694    12   12   2891  52%   22%
34  Chess Tiger 15          118       2693       2692       2691     2692     9    9   4992  50%   34%
35  The King 3.23           116       2691       2690       2686     2688    11   11   4391  52%   32%
36  Shredder Classic 1.1    113       2687       2687       2677     2678    86   86    45   46%   48%
38  Fritz 9                 110                  2684         0      2198    383  521   1     0%    0%
37  Gambit Tiger 2          110       2685       2684       2684     2686    17   17   1188  55%   38%
39  Chess Tiger 14          107       2682       2681       2679     2681    17   17   1349  56%   34%
40  Ruffian 2.1.0           106       2681       2680       2681     2682    10   10   4037  50%   27%
42  Chess Tiger 15 Gambit   103       2676       2677       2667     2670    41   41   223   42%   32%
41  SmarThink 1.00          103       2686       2677       2680     2680    80   80    56   50%   39%
43  Pharaon 3.2-64          100       2676       2674       2673     2673    79   78    67   55%   17%
44  Shredder 6.02            99       2674       2673       2672     2674    23   23   733   54%   30%
46  Deep Fritz               95       2669       2669       2667     2669    21   21   877   56%   29%
45  Pro Deo 1.0              95       2671       2669       2669     2670    16   16   1484  53%   25%
47  List 5.12                92       2668       2666       2668     2669    11   11   3382  50%   26%
48  SlowChess Blitz WV2      91       2665       2665       2671     2671    33   34   357   47%   25%
49  Gambit Tiger 2 aggr      89       2664       2663       2656     2658    49   49   150   51%   34%
50  Junior 7                 82       2657       2656       2656     2658    16   16   1503  50%   28%
53  Fritz 6                  79       2654       2653       2651     2652    24   24   636   56%   30%
52  Rebel 12                 79       2652       2653       2653     2654    21   21   876   47%   32%
51  Ruffian 1.0.5            79       2654       2653       2657     2658    11   11   3127  52%   29%
54  Deep Sjeng 1.5           72       2647       2646       2648     2649    27   27   509   47%   29%
55  Shredder 6               71       2647       2645       2644     2646    25   25   611   49%   34%
56  Crafty 19.13-64          68       2645       2642       2655     2654    90   89    44   56%   40%
58  Kaissa 1.8a              65       2644       2639       2653     2650    88   84    59   65%   18%
57  List 5.04                65       2639       2639       2640     2642    20   20   907   49%   37%
60  Pseudo 0.7c              64       2638       2638       2644     2644    20   20   962   49%   30%
59  Spike 0.9a               64       2640       2638       2646     2646    14   14   2152  50%   25%
61  Maestro 1.09             63       2642       2637       2640     2640    74   74    74   50%   20%
62  Aristarch 4.50           60       2634       2634       2640     2640    10   11   3843  48%   23%
63  Fruit 2.0                59       2634       2633       2640     2640    13   13   2637  49%   20%
64  Deep Sjeng 1.6           56       2631       2630       2638     2639    16   16   1578  47%   25%
65  Junior 6                 55       2628       2629       2628     2630    26   26   553   52%   29%
66  SOS 5 for Arena          55       2629       2629       2636     2636    13   14   2197  45%   22%
67  Hiarcs 8                 54       2631       2628       2631     2632    17   17   1351  45%   32%
68  Zappa 1.1                51       2625       2625       2634     2634    41   41   229   44%   30%
69  SlowChess Blitz WV       49       2626       2623       2634     2634    15   15   1820  47%   27%
70  Crafty 19.20-64          43       2609       2617         0      2611    177  188   12   41%    0%
71  Glaurung 0.2.4           41       2613       2615       2628     2627    23   23   757   52%   27%
72  Maestro 1.08             40       2616       2614       2624     2623    64   63   106   57%   16%
74  Glaurung Mainz           39       2614       2613       2627     2626    29   29   473   48%   24%
73  Naum 1.82                39       2608       2613       2618     2619    77   83    71   28%   21%
75  Naum 1.8                 36       2610       2610       2626     2625    24   24   671   50%   30%
77  Jonny 2.82               35       2609       2609       2625     2623    33   33   369   52%   22%
76  Pharaon 3.3              35       2610       2609       2622     2622    19   19   1140  49%   27%
78  SOS 4 for Arena          31       2605       2605       2614     2613    18   18   1226  52%   22%
79  Scorpio 1.3              30       2605       2604       2620     2618    25   25   619   51%   25%
80  SmarThink 0.18a          29       2604       2603       2613     2612    35   35   318   49%   27%
81  The King 3.12d           27       2600       2601       2600     2602    20   20   930   46%   36%
82  Anaconda 2.0.1           24       2599       2598       2609     2609    18   18   1240  42%   26%
83  TRACE 1.35               24       2680       2598         0      2724    199  173   9    77%   22%
84  Aristarch 4.21           21       2597       2595       2601     2602    18   18   1236  45%   30%
85  Zappa 1.0                15       2587       2589       2600     2600    15   15   1694  46%   25%
87  Delfi 4.5 CIPS           14       2589       2588       2600     2600    12   12   2715  46%   26%
86  Glaurung 0.2.3           14       2589       2588       2602     2602    20   20   1023  45%   23%
88  Pharaon 3.1-64           13       2586       2587       2602     2602    85   86    56   48%   21%
89  SmarThink 0.17a          12       2587       2586       2596     2596    23   23   737   48%   27%
91  Crafty 19.12             10       2583       2584       2594     2594    31   31   402   49%   33%
90  Thinker 4.7a             10       2585       2584       2600     2599    14   14   1868  46%   29%
92  SOS 3 for Arena          8        2584       2582       2587     2588    33   33   339   44%   33%
93  WARP 0.58                7        2579       2581       2588     2587    52   52   151   50%   23%
95  DanChess CCT7            5        2578       2579       2592     2592    17   17   1358  46%   22%
94  Gandalf 5.1              5        2580       2579       2584     2585    23   23   720   43%   33%
97  Glaurung 0.2.2           5        2585       2579       2598     2596    36   36   334   52%   15%
96  Shredder 5.32            5        2580       2579       2585     2587    20   20   941   43%   30%
98  Gandalf 4.32h            4        2578       2578       2583     2584    19   20   1022  45%   27%
99  WBNimzo 2000b            2        2576       2576       2586     2585    59   58   122   55%   18%
100 Scorpio 1.4              -2       2571       2572       2586     2585    39   39   248   47%   30%
102 Petir 2.97               -5       2568       2569       2583     2584    134  134   20   50%   30%
101 Thinker 4.6c             -5       2566       2569       2576     2576    28   28   494   46%   31%
103 Little Goliath Evolution -6       2567       2568       2583     2582    25   25   640   45%   28%
104 Pharaon 3.2              -6       2566       2568       2581     2581    15   16   1706  39%   24%
105 WARP 0.37                -9       2580       2565       2586     2585    79   78    68   53%   16%
106 Ktulu 5.1               -10       2564       2564       2577     2578    19   19   1128  39%   23%
107 WARP 0.65               -10       2561       2564       2577     2578    50   50   174   43%   14%
108 Aristarch 4.4           -12       2559       2562       2566     2566    71   73    76   42%   23%
109 PostModernist 1010a     -14       2599       2560         0      2640    176  172   13   53%    0%
110 WildCat 5.0             -15       2559       2559       2581     2579    32   32   391   45%   27%
111 Delfi 4.4               -16       2557       2558       2570     2570    55   55   135   49%   21%
112 Nimzo 8                 -16       2558       2558       2564     2566    17   17   1347  39%   30%
114 Naum 1.7                -19       2554       2555       2567     2567    19   19   1079  41%   30%
113 Pharaon 3.1             -19       2555       2555       2574     2572    40   40   239   49%   25%
116 Hiarcs 7.32             -20       2552       2554       2557     2559    24   24   693   38%   29%
115 SlowChess Blitz 0.4     -20       2552       2554       2572     2571    32   32   400   46%   23%
117 Junior 5                -22       2549       2552       2557     2558    31   32   460   43%   25%
118 Naum 1.71               -24       2551       2550       2569     2566    87   85    56   57%   17%
119 Crafty Classic 2004     -25       2545       2549       2562     2562    95   98    44   39%   20%
121 Francesca M.A.D 0.10    -26       2608       2548         0      2606    217  217   7    50%   14%
120 Pharaon 3.0             -26       2545       2548       2560     2560    40   40   257   42%   23%
124 AnMon 5.53              -28       2544       2546       2564     2562    19   20   1058  46%   23%
123 Crafty Cito 1.2         -28       2545       2546       2564     2561    30   30   418   48%   28%
122 SmarThink 0.16b++       -28       2549       2546       2560     2560    56   56   125   47%   28%
125 The Baron 1.7.0         -31       2541       2543       2560     2564    47   48   185   40%   21%
126 The Baron 1.6.1         -32       2541       2542       2562     2560    25   25   669   50%   20%
127 Fritz 5.32              -33       2540       2541       2549     2550    26   26   595   39%   25%
128 Kaissa 1.7              -33       2542       2541       2564     2560    112  109   32   56%   18%
129 Green Light Chess 3.01.2-34       2538       2540       2559     2557    18   18   1311  46%   24%
130 AnMon 5.51              -36       2538       2538       2560     2559    17   17   1501  42%   22%
131 Crafty 19.15            -36       2536       2538       2559     2558    26   26   569   47%   25%
133 DanChess 1.08           -38       2543       2536       2566     2563    66   65    92   51%   25%
132 Tao 5.7                 -38       2535       2536       2554     2553    18   18   1258  43%   18%
134 Little Goliath Nemesis  -40       2533       2534       2539     2540    25   26   634   32%   28%
136 Amyan 1.595             -45       2529       2529       2547     2547    20   20   1069  38%   24%
135 Patriot 1.3.0           -45       2527       2529       2547     2547    23   24   783   34%   20%
137 Ufim 7.01               -47       2525       2527       2546     2544    29   29   472   42%   26%
138 Crafty 19.06            -49       2531       2525       2548     2548    101  103   38   44%   21%
139 SlowChess Blitz         -50       2524       2524       2544     2541    86   85    50   53%   34%
140 Zarkov 4.75             -50       2521       2524       2543     2542    30   31   425   41%   26%
142 Crafty 19.19            -53       2519       2521       2534     2534    27   27   555   38%   25%
141 Tao 5.6                 -53       2512       2521       2534     2532    77   78    72   45%   13%
144 Anaconda 1.6.2          -54       2518       2520       2526     2527    43   44   213   34%   30%
143 Green Light Chess 3.00.3-54       2519       2520       2540     2539    32   33   384   42%   23%
145 Arasan 8.4              -56       2558       2518       2642     2592    192  171   11   68%    9%
146 Delfi 4.2 CIPS          -56       2512       2518       2524     2523    62   63   108   42%   20%
148 Scorpio 1.5             -57       2515       2517       2536     2535    59   61   110   39%   25%
147 The Baron 1.5.0         -57       2514       2517       2537     2536    27   27   581   44%   19%
149 Crafty Cito 1.4         -58       2508       2516       2534     2533    40   41   246   41%   23%
150 SOS 2 for Arena         -60       2512       2514       2518     2519    52   54   145   31%   28%
152 Fruit 1.5               -62       2508       2512       2533     2531    34   34   356   43%   24%
151 Little Goliath Revival  -62       2512       2512       2534     2532    22   22   840   44%   20%
153 Green Light Chess 3.00  -64       2511       2510       2534     2533    44   45   212   44%   20%
154 Crafty 18.15            -65       2509       2509       2516     2518    22   22   878   29%   28%
155 Gothmog 1.0 beta 10     -65       2508       2509       2534     2533    25   25   683   44%   19%
156 WildCat 4.0             -65       2508       2509       2533     2531    22   22   832   45%   17%
157 Movei 00.8.306          -66       2505       2508       2530     2529    62   64   105   37%   20%
158 Spike 0.8a              -67       2504       2507       2527     2526    66   68    97   39%   16%
159 Yace 0.99.87            -67       2506       2507       2527     2526    16   15   1737  37%   22%
161 DanChess 1.07           -68       2508       2506       2534     2531    40   40   245   49%   22%
160 Movei 00.8.310          -68       2506       2506       2529     2527    22   22   819   38%   25%
164 El Chinito 3.25         -70       2497       2504       2525     2524    181  187   13   46%    0%
162 Petir 2.75              -70       2504       2504       2524     2522    33   33   352   40%   30%
163 Yace 0.99.79 Paderborn  -70       2505       2504       2517     2518    20   21   974   34%   27%
165 Pharaon 2.62            -71       2502       2503       2517     2517    27   27   536   38%   27%
166 WildCat 3.0             -71       2503       2503       2524     2521    143  142   21   52%    9%
167 Zarkov 4.5              -77       2494       2497       2515     2514    51   51   158   43%   18%
168 Quark 2.35 Paderborn    -80       2492       2494       2520     2519    23   23   790   40%   16%
169 Ufim 6.00               -81       2492       2493       2517     2515    19   19   1121  39%   23%
170 Naum 1.6                -83       2492       2491       2516     2514    32   32   401   41%   25%
171 Patriot 1.2.3           -84       2487       2490       2504     2503    64   66    95   35%   28%
172 Zarkov 4.67             -84       2490       2490       2514     2511    47   47   185   46%   20%
173 Naum 1.4                -87       2487       2487       2513     2511    38   39   276   41%   25%
174 Snitch 1.0.8            -88       2475       2486       2507     2548    178  165   12   62%    8%
175 SpiderChess 3.61        -88       2478       2486       2504     2501    86   86    53   48%   24%
176 Jonny 2.75              -90       2484       2484       2512     2509    39   40   276   43%   14%
179 Ceng 2.53.6b            -91       2477       2483         0      2554    202  194   9    55%    0%
177 Pepito 1.59             -91       2482       2483       2508     2506    21   22   913   39%   21%
180 Quark 2.55              -91       2484       2483       2511     2509    63   65   103   43%   16%
178 SlowChess 2.94          -91       2486       2483       2515     2511    47   47   182   48%   21%
181 Amyan 1.597             -92       2482       2482       2509     2506    24   24   693   39%   25%
182 Amyan 1.594             -93       2477       2481       2511     2508    33   34   377   44%   16%
183 Petir 2.5c              -98       2474       2476       2500     2497    30   30   435   39%   25%
184 AnMon 5.40              -101      2471       2473       2506     2503    49   49   164   46%   22%
185 Amateur 2.86            -102      2502       2472       2569     2552    152  155   13   46%   30%
186 Crafty Cito 1.4.1       -102      2451       2472         0      2475    235  252   7    42%    0%
187 Gromit 3.8.2            -104      2465       2470       2492     2491    34   35   379   40%   11%
188 Movei 00.8.295          -105      2464       2469       2495     2493    26   26   609   40%   19%
190 E.T.Chess 15.04.05      -111      2460       2463       2495     2492    70   71    84   44%   17%
189 SmarThink 0.15b         -111      2457       2463       2495     2495    90   95    53   35%   11%
191 Gosu 0.11               -112      2461       2462         0      2506    214  205   7    57%   28%
192 Movei 00.8.317          -113      2451       2461       2490     2488    131  138   23   39%   17%
193 Capture R01             -115      2442       2459       2492     2540    156  152   15   56%   20%
194 Abrok 5.0               -118      2452       2456       2487     2484    32   32   416   42%   15%
195 Comet B.68              -127      2445       2447       2476     2474    35   36   342   39%   18%
196 Nejmet 3.07             -132      2438       2442       2475     2472    30   31   475   37%   17%
197 AnMon 5.32              -135      2437       2439       2467     2466    43   44   224   34%   21%
198 King Of Kings 2.57      -135      2436       2439       2476     2473    27   28   590   38%   16%
199 Leila 0.53h             -138      2427       2436       2451     2471    127  129   24   47%   12%
200 Cerebro 2.01            -139      2418       2435       2448     2478    120  127   25   38%   20%
201 Dragon 4.7.5            -142      2426       2432       2511     2472    118  120   28   46%   14%
202 Frenzee 200             -144      2412       2430       2419     2470    170  179   12   41%   16%
203 Patzer 3.62             -154      2418       2420       2461     2458    34   34   386   38%   15%
204 Movei 00.8.263          -158      2423       2416       2460     2456    64   66    92   41%   26%
205 KnightDreamer 3.3       -159      2412       2415       2455     2453    29   29   543   34%   17%
206 Muse 0.899b             -167      2384       2407         0      2483    145  154   16   37%   25%
207 Hermann 1.5             -169      2300       2405         0      2439    236  293   5    20%    0%
208 Chezzz 1.0.3            -199      2334       2375         0      2425    175  185   10   40%   20%
209 Phalanx XXII            -208      2301       2366         0      2435    187  215   11   27%    0%
210 Jonny 2.82-64           -215      2378       2359         0      2398    323  289   3    66%    0%
211 SpiderChess 3.87        -237      2271       2337         0      2328    229  278   6    25%   16%
212 The Crazy Bishop 0052   -242      2254       2332       2406     2374    152  173   18   22%   11%
213 Booot 4.75              -243      2281       2331       2403     2371    172  185   11   36%   18%
214 Terra 3.4               -265      2271       2309       2358     2353    132  147   22   29%   13%
215 Bruja 1.9               -268      2203       2306       2262     2323    176  210   12   25%    0%
216 EXchess 4.03            -330      2182       2244       2241     2241    160  223   21    9%    9%
217 Patzer 3.71             -342      2087       2232         0      2181    218  280   6    16%    0%
218 Faile 1.4.4             -483      1945       2091       2062     2063    186  330   28    3%    0%


Hello R?mi,
IMO, yours tool is the best one for rating's calc, but I would like an output as that one of BayesElo 0052.05 or like that one of the third column, than I have obtained adding +2574 Elo to the output of version 0052.12.
Another nice feature would be then that one of being able to add the elo in the tag of the games.

Ciao
Salvo
Salvo Spitaleri
 
Posts: 10
Joined: 04 Oct 2004, 18:46
Location: Italy

Re: ELOStat algorithm ?

Postby Rémi Coulom » 24 Sep 2005, 10:18

Kirill Kryukov wrote:I got curious about one thing. As I understand, BayesELO is tuned for WBEC currently. As far as I know, WBEC is a very sparse tournament. It has big roundrobins for leagues, and leagues are connected by smaller promotion roundrobins. Do you think that tuning BayesELO to such sparse data will also give good ratings for more concentrated tournament, like a single, but big, roundrobin?

The sparsity of the tournament is not really what determines the best value of the prior. The prior indicates how close in strength we expect players to be. In a tournament where very weak players may play against very strong players, it might be better to use a smaller prior. In tournaments where most of the games are between players that are close in strength, a larger prior might be better.

Also, changing the prior should not change the order of players much. The effect of increasing the prior is mainly to reduce the scale of rating differences, as you have already noticed.

Kirill Kryukov wrote:Another thing.. For a single roundrobin BayesELO and ELOstat are not too much different. But for a sparse data like WBEC they give very different ratings. I tried it for WBEC games, and the ratings are sometimes very different.. (One table is here, just hit esc after the rating table is opened, to not load the huge pairwise tables below that). So I wonder, how they can be so different? I can understand a few percent difference, but 158 vs -342 (Kiwi 0.5a), or 283 vs -132 (Delphil 1.5b) is quite large...

This is a very interesting example. The big rating differences that you noticed revolve around "Promo D" of WBEC 10. Let us take the striking example of Natwarlal 0.12 and NullMover 0.25. Natwarlal 0.12 finished in the top of division 4, and won the promotion tournament. NullMover 0.25 was in the bottom of division 3, and performed poorly in the promotion tournament. Here are the ratings that we get:
  • Natwarlal 0.12: 210 (bayeselo) and -267(elostat)
  • NullMover 0.25: 74(bayeselo) and -148(elostat)
I have a strong feeling that the ratings produced by bayeselo are much better than those produced by elostat in this situation. A fundamental problem of elostat is that it makes the assumption that when a program gets a winning percentage against a variety of opponents, it is equivalent to the same winning percentage against one single opponent, whose rating is equal to the average of opponents. This assumption is very wrong, and fails badly in this particular situation.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Rémi Coulom » 24 Sep 2005, 10:28

Salvo Spitaleri wrote:Hello R?mi,
IMO, yours tool is the best one for rating's calc, but I would like an output as that one of BayesElo 0052.05 or like that one of the third column, than I have obtained adding +2574 Elo to the output of version 0052.12.
Another nice feature would be then that one of being able to add the elo in the tag of the games.

Ciao
Salvo

I think I understand your question now: meanelo was replaced by offset. offset adds a constant to elo ratings.

Regarding adding tags to the PGN file, this would not be extremely difficult to do. I will put it in the TODO list, but implementing this kind of feature has a very low priority for me. I prefer to focus my efforts on trying to find better rating evaluations when I have time to work on bayeselo.

Thanks for your interest,

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Salvo Spitaleri » 24 Sep 2005, 10:39

Regarding adding tags to the PGN file, this would not be extremely difficult to do. I will put it in the TODO list, but implementing this kind of feature has a very low priority for me.


Thank you very much!

Bests
Salvo
Salvo Spitaleri
 
Posts: 10
Joined: 04 Oct 2004, 18:46
Location: Italy

Re: ELOStat algorithm ?

Postby Rémi Coulom » 24 Sep 2005, 22:29

R?mi Coulom wrote:
Kirill Kryukov wrote:Another thing.. For a single roundrobin BayesELO and ELOstat are not too much different. But for a sparse data like WBEC they give very different ratings. I tried it for WBEC games, and the ratings are sometimes very different.. (One table is here, just hit esc after the rating table is opened, to not load the huge pairwise tables below that). So I wonder, how they can be so different? I can understand a few percent difference, but 158 vs -342 (Kiwi 0.5a), or 283 vs -132 (Delphil 1.5b) is quite large...

This is a very interesting example. The big rating differences that you noticed revolve around "Promo D" of WBEC 10. Let us take the striking example of Natwarlal 0.12 and NullMover 0.25. Natwarlal 0.12 finished in the top of division 4, and won the promotion tournament. NullMover 0.25 was in the bottom of division 3, and performed poorly in the promotion tournament. Here are the ratings that we get:
  • Natwarlal 0.12: 210 (bayeselo) and -267(elostat)
  • NullMover 0.25: 74(bayeselo) and -148(elostat)
I have a strong feeling that the ratings produced by bayeselo are much better than those produced by elostat in this situation. A fundamental problem of elostat is that it makes the assumption that when a program gets a winning percentage against a variety of opponents, it is equivalent to the same winning percentage against one single opponent, whose rating is equal to the average of opponents. This assumption is very wrong, and fails badly in this particular situation.

R?mi


I have run more experiments with this data. When running ELOstat on this database it produces this output:
Code: Select all
Calculating Elo ratings...
Iteration failed - degenerate database
1001 iterations

So, I thought that maybe ELOstat does not produce good ratings because it fails to converge. In order to test this, I implemented the ELOstat algorithm inside bayeselo. It took 1529 iterations to converge. The resulting ratings where -247 for NullMover, and -459 for Natwarlal, which really makes no sense.

The new version with the ELOstat algorithm built-in and other minor improvements is now available on my web page:
http://remi.coulom.free.fr/Bayesian-Elo/
Note that my implementation of ELOstat does not always produce results that are perfectly identical to ELOstat. There are sometimes differences of one or two points.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Kirill Kryukov » 27 Sep 2005, 04:16

R?mi, thank you for explanation and new version of Bayeselo!

R?mi Coulom wrote:A fundamental problem of elostat is that it makes the assumption that when a program gets a winning percentage against a variety of opponents, it is equivalent to the same winning percentage against one single opponent, whose rating is equal to the average of opponents. This assumption is very wrong, and fails badly in this particular situation.

I still don't understand completely. Suppose engine A vs B score is 5-20, and A vs C score is 10-20. (So that A sucks). Now how this is different from imaginary A vs D with score 15-40, and D having average rating of B and C? The A still sucks, no? Even if the rating of A is a little different in the second case, how it can be so much different as in your and my examples above?
User avatar
Kirill Kryukov
 
Posts: 127
Joined: 21 Sep 2005, 09:56

Re: ELOStat algorithm ?

Postby Rémi Coulom » 27 Sep 2005, 19:40

Kirill Kryukov wrote:R?mi, thank you for explanation and new version of Bayeselo!

R?mi Coulom wrote:A fundamental problem of elostat is that it makes the assumption that when a program gets a winning percentage against a variety of opponents, it is equivalent to the same winning percentage against one single opponent, whose rating is equal to the average of opponents. This assumption is very wrong, and fails badly in this particular situation.

I still don't understand completely. Suppose engine A vs B score is 5-20, and A vs C score is 10-20. (So that A sucks). Now how this is different from imaginary A vs D with score 15-40, and D having average rating of B and C? The A still sucks, no? Even if the rating of A is a little different in the second case, how it can be so much different as in your and my examples above?

Big differences arise in other kinds of situations. Suppose that A beats B, B draws C, and C beats D. In this kind of situation, one would suppose that B and C are of similar strength, A is strongest, and D weakest. ELOstat will say that B is significantly stronger than C. More precisely, here are the outputs of the programs in this situation

bayeselo:
Code: Select all
Rank Name   Elo    +    - games score draws
   1 A      104  355  274     1  100%    0%
   2 C        8  211  194     2   75%   50%
   3 B       -8  194  211     2   25%   50%
   4 D     -104  274  355     1    0%    0%

C is rated 16 points ahead of B, because it drew B with the black pieces.

ELOstat:
Code: Select all
    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 A                              :  709    0   0     1   100.0 %    109    0.0 %
  2 B                              :  109  259 409     2    25.0 %    300   50.0 %
  3 C                              : -109  409 259     2    75.0 %   -300   50.0 %
  4 D                              : -709    0   0     1     0.0 %   -109    0.0 %


As you can see, ELOstat is completely confused. I suppose this is similar to what happens in the WBEC case, with A being the stronger division, D being the lower division and B-C being the qualification tournament between them.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

bug fixed

Postby Rémi Coulom » 27 Sep 2005, 19:47

The results provided in my previous reply were obtained with a fixed bayeselo, that you can download here:
http://remi.coulom.free.fr/Bayesian-Elo/
There was a bug in the prior calculation, that is now fixed.

R?mi
Rémi Coulom
 
Posts: 96
Joined: 12 Nov 2004, 13:47
Location: Lille, France

Re: ELOStat algorithm ?

Postby Kirill Kryukov » 29 Sep 2005, 04:21

Thank you for detailed explanation and for new version of BayesELO! It makes sense to me now. :) I am now thinking to switch to BayesELO for computing ELOstat ratings (instead of using ELOstat itself), to simplify my processing script.

R?mi Coulom wrote:Note that my implementation of ELOstat does not always produce results that are perfectly identical to ELOstat. There are sometimes differences of one or two points.

Can be precision/rounding issue. 1-2 points is close enouigh!
User avatar
Kirill Kryukov
 
Posts: 127
Joined: 21 Sep 2005, 09:56

Re: ELOStat algorithm ?

Postby Kirill Kryukov » 29 Sep 2005, 06:17

R?mi, I compared the ELOstat ratings produced by Bayeselo and real ELOstat ratings. You can see my table on this page. Two things to notice - the difference is often more than 1-2 points, it goes up to 10 points sometimes. And second, the uncertainty estimation is quite different, especially at the both ends of the table. Do you have guess why? ;)

[update]
I updated CEGT statistics page to include ELOstat ratings by both Bayeselo and ELOstat. They look very similar except some anomalies like -5037 for "Shredder 9 UCI" and "Fritz 8".
User avatar
Kirill Kryukov
 
Posts: 127
Joined: 21 Sep 2005, 09:56

PreviousNext

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 6 guests