Differences at different time controls

Discussions about Winboard/Xboard. News about engines or programs to use with these GUIs (e.g. tournament managers or adapters) belong in this sub forum.

Moderator: Andres Valverde

Differences at different time controls

Postby Volker Pittlik » 26 Jul 2007, 16:47

I've made a test with the same engines and the same playing conditions but with different time controls. I used games with one minute initially and 1 second increment nd compared the results to games at 10 minutes initially with 10 seconds increment. I want to see if there are much differences. It seems to be more or less the same although there are some differences. I'm a bit surprised that the rating can be so different if the rank is identical. I expected a bigger differences in the ranks (if any).

Code: Select all
1-1                                                        Diffs
---                                                                Rating out of
Rank Name                 Elo   +   games   maxElo minElo    Rank  error margin?
 1   Fruit (Toga) 1.2.1a  193  42 40  220    235    153       0         no
 2   Glaurung2 -e4 perf   125  40 39  220    165     86       1         no
 3   Spike 1.2 Turin       97  39 38  220    136     59      -1         no
 4   Ruffian 2.1.0         75  37 36  220    112     39       2         yes
 5   Shredder Classic 1.3  58  38 38  220     96     20       0         no
 6   Scorpio 1.91          49  37 37  220     86     12      -2         yes
 7   Crafty-21.5          -46  37 37  220     -9    -83       2         no
 8   Jonny 2.83           -47  37 38  220    -10    -85      -1         no
 9   Zappa 1.1            -62  37 38  220    -25   -100      -1         no
10   Yace Paderborn       -65  37 38  220    -28   -103       0         no
11   Arasan 9.5           -98  37 38  220    -61   -136       0         yes
12   Hermann 2.0         -280  46 50  220   -234   -330       0         yes


10-10
-----
 1   Fruit (Toga) 1.2.1a  187  41 39  220    228    148
 2   Spike 1.2 Turin      143  39 38  220    182    105
 3   Glaurung2 e4 perf    105  38 37  220    143     68
 4   Scorpio 1.91          92  38 37  220    130     55
 5   Shredder Classic 1.3  39  37 37  220     76      2
 6   Ruffian 2.1.0         21  37 37  220     58    -16
 7   Jonny 2.83           -33  37 37  220      4    -70
 8   Zappa 1.1            -52  37 37  220    -15    -89
 9   Crafty-21.5          -60  37 37  220    -23    -97
10   Yace Paderborn       -81  37 38  220    -44   -119
11   Arasan 9.5          -152  39 40  220   -113   -192
12   Hermann 2.0         -210  41 44  220   -169   -254


Volker



P.S. Other conditions:

Processor: Intel Core2Duo E6300
OS: Linux 2.6.18.8-0.5 SMP i686 (SuSE 10.2, 32 bit)
Xboard: 4.2.7
Polyglot:1.4
Ponder: off
Learning: off
Hash: Approximately 32 MB if adjustable else defaults, swapping not tolerated
Books: Own books if available, else self created generic books, no manual tuning
TBs: and other endgame stuff up tp 4 pieces in RAM disks
RAM: 1 GB
User avatar
Volker Pittlik
 
Posts: 1031
Joined: 24 Sep 2004, 10:14
Location: Murten / Morat, Switzerland

Re: Differences at different time controls

Postby H.G.Muller » 30 Jul 2007, 09:26

Nothing unusual here.

In the first place the error bars on the rating tell you were the true rating is supposed to ly (with 68% confidence). If you do a re-measurement of the rating, that new measurement will have its own error bars, and will thus on the average differ more from the first measurement than the true rating would.

The error bars should be added, in root-mean-square fashion, and for equal ranges that means they get about 40% larger.

Then there is the second effect: the re-measurement will only ly within these enlarged error bars in 68% of the cases. That means it is expected to ly outide of these error bars in 32% of the cases. As you test 12 engines here, it is thus quite normal that 4 of them will fall outside of the given error bars by more than 40%. Even if you would have tested under exactly the same conditions. (Assuming the randomness in the engines is enough to consider these indepedent tests.)
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: Differences at different time controls

Postby Greg Simpson » 01 Aug 2007, 10:58

I know the default confidence in Bayeselo is 95% (it can be changed). I thought that was the standard in all the ratings programs. Am I wrong?
Greg Simpson
 
Posts: 29
Joined: 05 Oct 2004, 06:07
Location: Irvine, CA, USA

Re: Differences at different time controls

Postby H.G.Muller » 02 Aug 2007, 08:03

Oh, you might be right. I just assumed it was the standard error, as it is usual in statistics to quote that. The 95% confidence interval is 1.96 times the standard error.

So then the fraction of ptograms that would be expected to ly outside of the confidence interval would be somewhat smaller. But you still expect on the average several to ly outside it, and often you will observe a number larger than the average.

This is not really a strong indication that the ratings are actually different. (Although they might of course be, as this is a different time control.)
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL


Return to Winboard and related Topics

Who is online

Users browsing this forum: Google [Bot] and 31 guests