AEGT gauntlets 40/40 and rating experiment statistics

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

AEGT gauntlets 40/40 and rating experiment statistics

Postby Heinz van Kempen » 23 Sep 2004, 17:11

Geschrieben von:/Posted by: Heinz van Kempen at 23 September 2004 18:11:12:

Hi all ,
warning, this is a verbose and longish post full of numbers. So please close it immediately .
After the Pharaon 3.00b gauntlet two more K,Q,R,(B) gauntlets with AnMon 5.40 and Jonny 2.70 with 40/40 adapted to 2 Ghz CPU are finished.
Goal is not to give a realistic rating for those two, what is not possible with only 74 games each and should be done in AEGT 2, but to improve the overall AEGT rating list.
Uri was correct to assume that ratings were too far apart between the groups and I also assessed that this would change with more gauntlets to combine the "pools".
First the results:
AnMon 5.40 scored 11.5 points out of 24 (47.94%)against King Class
____________________________________________________________________



Aristarch 4.50            1.0 - 1.0 
List 512                  0.0 - 2.0 
Delfi 4.5                 1.0 - 1.0 
Quark 2.35                1.5 - 0.5 
Thinker 4.6c              0.5 - 1.5 
Crafty 19.15              1.0 - 1.0 
Gothmog 1.0 B10           1.5 - 0.5 
Tao 5.7 b04               1.5 - 0.5 
SmarThink 0.17a           1.0 - 1.0 
AnMon 5.32                1.0 - 1.0 
Yace 0.99.87              0.5 - 1.5 
Ruffian 1.0.5             1.0 - 1.0 

AnMon 5.40 scored 12 points out of 22 (54.55%)against Queen Class
____________________________________________________________________

SlowChess 2.93a           0.5 - 1.5 
Amyan 1.593b              0.5 - 1.5 
GLC 3.00.3.4              1.0 - 1.0 
Ufim 5.01                 1.5 - 0.5 
Fruit 1.5t                0.0 - 2.0 
WildCat 4                 1.0 - 1.0 
Movei 00.8.247s           1.5 - 0.5 
Jonny 2.64                1.5 - 0.5 
Amy 0.8.7b                1.5 - 0.5 
Dragon 4.5 CF             1.5 - 0.5 
King of Kings 2.56        1.5 - 0.5 

AnMon 5.40 scored 12.5 points out of 24 (52.08%)against Rook Class
____________________________________________________________________

Arasan 7.4                0.0 - 2.0 
Pepito v1.59              0.5 - 1.5 
The Baron 1.4.0 b2        0.5 - 1.5 
Naum 1.2                  0.5 - 1.5 
Amateur 2.80              0.5 - 1.5 
The Crazy Bishop 0052     1.5 - 0.5 
KnightDreamer 3.3         1.5 - 0.5 
Frenzee 159               2.0 - 0.0 
PostModernist 1010a       1.5 - 0.5 
Comet B.68                1.5 - 0.5 
Pharaon 2.62              1.5 - 0.5 
Terra 3.3B11              1.0 - 1.0 
and against the promoted engines:
Spike 0.6                 2.0 - 0.0 
DanChess 1.0.6 DC         1.5 - 0.5 

So not much difference in score against the three classes. AnMon 5.40 even scored worse against Rook Class than against Queen Class.

Jonny 2.70 scored 7 points out of 24 (29.17%) against King Class
________________________________________________________________

Aristarch 4.50            1.5 - 0.5 
Ruffian 1.0.5             0.0 - 2.0 
List 512                  0.0 - 2.0 
Thinker 4.6c              0.5 - 1.5 
Crafty 19.15              0.0 - 2.0 
Smar Think 0.17a          0.5 - 1.5 
Tao 5.7 b04               1.0 - 1.0 
Quark 2.35                1.5 - 0.5 
Delfi 4.5                 0.0 - 2.0 
Gothmog 1.0 beta 10       1.0 - 1.0 
Yace 0.99.87              0.0 - 2.0 
AnMon 5.32                1.0 - 1.0 

Jonny 2.70 scored 8 points out of 22 (36.37%) against Queen Class
__________________________________________________________________

Ufim 5.01                 1.5 - 0.5 
King of Kings 2.56        1.0 - 1.0 
Amy 0.8.7b                0.0 - 2.0 
GLC 3.00.3.4              0.5 - 1.5 
WildCat 4                 0.0 - 2.0 
Amyan 1.593b              0.0 - 2.0 
SlowChess 2.93a           1.0 - 1.0 
Fruit 1.5t                1.0 - 1.0 
Movei 00.8.247s           1.5 - 0.5 
Jonny 2.64                1.0 - 1.0 
Dragon 4.5 CF             0.5 - 1.5 

Jonny 2.70 scored 12.5 points out of 24 (52.08%) against Rook Class
____________________________________________________________________

Arasan 7.4                0.5 - 1.5 
KnightDreamer 3.3         1.0 - 1.0 
Pepito v1.59              1.0 - 1.0 
Terra 3.3B11              1.0 - 1.0 
PostModernist 1010a       1.0 - 1.0 
Frenzee 159               1.0 - 1.0 
Naum 1.2                  2.0 - 0.0 
Pharaon 2.62              0.0 - 2.0 
Comet B.68                2.0 - 0.0 
The Baron 1.4.0 beta 2    1.5 - 0.5 
Amateur 2.80              0.5 - 1.5 
The Crazy Bishop 0052     1.0 - 1.0 
and against the promoted engines:
DanChess 1.0.6 DC         1.0 - 1.0 
Spike 0.6                 1.0 - 1.0 

So here not much difference between King and Queen Class.
All games are available for download on my website. The rating list there can be examined in several steps, day after day. 
Now the more interesting data (only for hardcore lovers of numbers [[smile]]):
The letters and numbers stand for the following:
K 1 - average rating from all King Class engines after only four gauntlets each between the upper classes
Q 1 - average rating from all Queen Class engines after only four gauntlets each between the upper classes
R 1 - average rating from all Rook Class engines after only four gauntlets each between the upper classes
KQ 1 - difference in average rating for King vs. Queen after only four gauntlets each between the upper classes
QR 1 - difference in average rating for Queen vs. Rook after only four gauntlets each between the upper classes

K 2 to QR 2 are corresponding, only that those values are after the Pharaon 3.00b gauntlet.
K 3 to QR 3 are corresponding, only that those values are after the next gauntlet with AnMon 5.40.
K 4 to QR 4 are corresponding, only that those values are after the next gauntlet with Jonny 2.70.

Okay, hope this is not too complicated, here it comes:
K 1 - 2786
K 2 - 2732
K 3 - 2700
K 4 - 2691
Q 1 - 2604
Q 2 - 2581
Q 3 - 2572
Q 4 - 2575
R 1 - 2458
R 2 - 2479
R 3 - 2493
R 4 - 2493
KQ 1 - 182
KQ 2 - 151
KQ 3 - 128
KQ 4 - 116
QR 1 - 146
QR 2 - 102
QR 3 -  79
QR 4 -  82

In plain words: the differences generally shrinked with each additional gauntlet between King and Queen Class. For example Green Light Chess ranked initially underneath all King Class engines although it won convincingly by some margin the Queen Class, now after the last gauntlet up to now is ranked better than King Class engines Quark, Yace, AnMon and Gothmog. I suspect that difference KQ is still too high and therefore I will also run more gauntlets with SlowChess 2.94 and Amyan 1.594 next here.

Except for the last gauntlet difference between Queen and Rook, where it was almost stable, difference between Queen and Rook Class also decreased. Ufim, with a modest performance in Queen Class and initially very close to Rook top engines with every additional gauntlet dropped underneath a lot of Rook Class engines.
Queen-Rook gauntlets in progress with Arasan 8.1 and Pseudo 0.6b indicate so far that the difference between Queen and Rook Class will shrink also a bit more still.
To underline this a bit with more numbers easier to understand here some individual standings from gauntlet to gauntlet, where the first rating is before the Pharaon gauntlet and the last rating is after the Jonny gauntlet. You will detect that the King Class engines dropped dramatically.

Quark 2.35 (King Class)         (2747-2692-2657-2646) 
AnMon 5.32 (King Class)         (2733-2674-2641-2632)
Ufim 5.01 (Queen Class)         (2489-2464-2455-2456)
The Baron 1.4.0b2 (Rook Class)  (2526-2546-2561-2559)
Amateur 2.80 (Rook Class)       (2401-2423-2439-2442)
DanChess 1.0.6 DC (Bishop Class)(2429-2459-2468-2469)  


[C:\HTML\Nunn\home.htm]
Why am I so exhausted now?
Best Regards
Heinz
Heinz van Kempen
 

correct link and note to daniel

Postby Heinz van Kempen » 23 Sep 2004, 18:18

Geschrieben von:/Posted by: Heinz van Kempen at 23 September 2004 19:18:51:
Als Antwort auf:/In reply to: AEGT gauntlets 40/40 and rating experiment statistics geschrieben von:/posted by: Heinz van Kempen at 23 September 2004 18:11:12:

Hi all,
the correct link is of course:
http://www.husvankempen.de/nunn/.
And it has to be: AnMon even scored worse against Rook Class engines than against Queen Class engines.
To Daniel:
DanChess 1.07c runs fine and I will run a gauntlet with AEGT time control against Queen and Rook engines.
Did you receive Tinker 4.6.5 for testing? I am not sure, because you did not confirm as requested and file was maybe too big for your account with more than 6 MB.
Best Regards
Heinz
Heinz van Kempen
 

Re: AEGT gauntlets 40/40 and rating experiment statistics

Postby Robert Allgeuer » 23 Sep 2004, 19:17

Geschrieben von:/Posted by: Robert Allgeuer at 23 September 2004 20:17:39:
Als Antwort auf:/In reply to: AEGT gauntlets 40/40 and rating experiment statistics geschrieben von:/posted by: Heinz van Kempen at 23 September 2004 18:11:12:


Interesting post to read, thanks.
In my view you should continue the gauntlets across classes until the average difference between all classes remains more or less constant for several gauntlets in a row. Then one can be pretty sure that the right balance has been achieved.
Robert
Robert Allgeuer
 

Re: AEGT gauntlets 40/40 and rating experiment statistics

Postby Heinz van Kempen » 23 Sep 2004, 19:22

Geschrieben von:/Posted by: Heinz van Kempen at 23 September 2004 20:22:34:
Als Antwort auf:/In reply to: Re: AEGT gauntlets 40/40 and rating experiment statistics geschrieben von:/posted by: Robert Allgeuer at 23 September 2004 20:17:39:
Interesting post to read, thanks.
In my view you should continue the gauntlets across classes until the average difference between all classes remains more or less constant for several gauntlets in a row. Then one can be pretty sure that the right balance has been achieved.
Robert
Hello Robert ,
that is exactly what I thought. To continue until difference is constant. Nonetheless with AEGT 2 all classes will be mixed with a lot of versions unchanged and then we will have the same effect.
Thanks for your interest.
Best Regards
Heinz
Heinz van Kempen
 

Re: correct link and note to daniel

Postby Daniel Shawul » 24 Sep 2004, 04:42

Geschrieben von:/Posted by: Daniel Shawul at 24 September 2004 05:42:34:
Als Antwort auf:/In reply to: correct link and note to daniel geschrieben von:/posted by: Heinz van Kempen at 23 September 2004 19:18:51:
Hi all,
the correct link is of course:
http://www.husvankempen.de/nunn/.
And it has to be: AnMon even scored worse against Rook Class engines than against Queen Class engines.
To Daniel:
DanChess 1.07c runs fine and I will run a gauntlet with AEGT time control against Queen and Rook engines.
Did you receive Tinker 4.6.5 for testing? I am not sure, because you did not confirm as requested and file was maybe too big for your account with more than 6 MB.
Best Regards
Heinz
Hi Heinz
I have recieved Tinker and will download all the rest of
knight class engines. Or you cand send me to my yahoo account,
no problem with yahoo[100MB storage availabe]
About the new version, it has a pondering bug :(
but works fine with ponder off.
i fixed that now . I will send that to you tonight.

best wishes
daniel
Daniel Shawul
 

Re: correct link and note to daniel

Postby Daniel Shawul » 24 Sep 2004, 05:16

Geschrieben von:/Posted by: Daniel Shawul at 24 September 2004 06:16:04:
Als Antwort auf:/In reply to: Re: correct link and note to daniel geschrieben von:/posted by: Daniel Shawul at 24 September 2004 05:42:34:
Hi all,
the correct link is of course:
http://www.husvankempen.de/nunn/.
And it has to be: AnMon even scored worse against Rook Class engines than against Queen Class engines.
To Daniel:
DanChess 1.07c runs fine and I will run a gauntlet with AEGT time control against Queen and Rook engines.
Did you receive Tinker 4.6.5 for testing? I am not sure, because you did not confirm as requested and file was maybe too big for your account with more than 6 MB.
Best Regards
Heinz
Hi Heinz
I have recieved Tinker and will download all the rest of
knight class engines. Or you cand send me to my yahoo account,
no problem with yahoo[100MB storage availabe]
About the new version, it has a pondering bug :(
but works fine with ponder off.
i fixed that now . I will send that to you tonight.

best wishes
daniel
I see that i only need to update Cerebro in the bishop class,
,remove the promoted Spike and then run the Tinker gauntlet,right?
Sorry I wasn't attanding the AEGT discussions closely.
Daniel Shawul
 

To Daniel and last request for new testers in AEGT 2

Postby Heinz van Kempen » 24 Sep 2004, 09:09

Geschrieben von:/Posted by: Heinz van Kempen at 24 September 2004 10:09:22:
Als Antwort auf:/In reply to: Re: correct link and note to daniel geschrieben von:/posted by: Daniel Shawul at 24 September 2004 06:16:04:
Hi all,
the correct link is of course:
http://www.husvankempen.de/nunn/.
And it has to be: AnMon even scored worse against Rook Class engines than against Queen Class engines.
To Daniel:
DanChess 1.07c runs fine and I will run a gauntlet with AEGT time control against Queen and Rook engines.
Did you receive Tinker 4.6.5 for testing? I am not sure, because you did not confirm as requested and file was maybe too big for your account with more than 6 MB.
Best Regards
Heinz
Hi Heinz
I have recieved Tinker and will download all the rest of
knight class engines. Or you cand send me to my yahoo account,
no problem with yahoo[100MB storage availabe]
About the new version, it has a pondering bug :(
but works fine with ponder off.
i fixed that now . I will send that to you tonight.

best wishes
daniel
I see that i only need to update Cerebro in the bishop class,
,remove the promoted Spike and then run the Tinker gauntlet,right?
Sorry I wasn't attanding the AEGT discussions closely.
Hello Daniel,
no prob. We voted to have the classes that way that the latest AEGT rating list available will be taken (exceptions are for the promoted engines where rating is not important), because this rating list is still improved with more gauntlets this days. So the final groups I can only give on September 29th. I will tell you then exactly how to test and which engines.
And here is a last request for new testers. We still need testers for all classes, also for King Class. Although it might seem that we are already a lot it would be fine to play more games for all engines. Testers can choose the class they want to run a double round robin in and it is their choice if they want to take four, six or eight weeks for doing that. There is no pressure to give a certain amount of games, missing games will be played by others if the tester is unable to continue.
If interested to help please write to one of the AEGT testers or to me.
Best Regards
Heinz
Heinz van Kempen
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 63 guests