AEGT Pharaon 3.00b against K, Q, R 40/40

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

AEGT Pharaon 3.00b against K, Q, R 40/40

Postby Heinz van Kempen » 21 Sep 2004, 07:44

Geschrieben von:/Posted by: Heinz van Kempen at 21 September 2004 08:44:16:

Hi all ,
also in 40/40 games repeated (adapted to 2 Ghz CPU) the new Pharaon beta proves to be a monster and scored not less than 52.5 points out of 74 games against all engines from King, Queen and Rook Class and the two promoters DanChess and Spike. This gives the new Pharaon rank 7 in AEGT rating list and clearly King Class level.
Games are available at my side under downloads and King/Queen gauntlets.
http://www.husvankempen.de/nunn/
Here are the results



Pharaon against:
King Class (14.5 points out of 24 games = 60.4%)
__________
Ruffian 1.0.5               1.5 - 0.5 
List 512                    1.0 - 1.0 
Aristarch 4.50              1.0 - 1.0 
Gothmog 1.0 B10             2.0 - 0.0 
Thinker 4.6c                1.0 - 1.0 
Crafty 19.15                1.0 - 1.0 
SmarThink 0.17a             1.5 - 0.5 
Tao 5.7 b04                 0.5 - 1.5 
Quark 2.35                  1.0 - 1.0 
Yace 0.99.87                1.0 - 1.0 
Delfi 4.5                   1.0 - 1.0 
AnMon 5.32                  2.0 - 0.0 

Queen Class (17 points out of 22 games = 77.3%)
___________

SlowChess 2.93a             1.5 - 0.5 
Movei 00.8.247s             1.5 - 0.5 
Amyan 1.593b                1.5 - 0.5 
Fruit 1.5t                  0.5 - 1.5 
Dragon 4.5 CF               1.5 - 0.5 
King of Kings 2.56          2.0 - 0.0 
Ufim 5.01                   2.0 - 0.0 
GLC 3.00.3.4                1.0 - 1.0 
WildCat 4                   2.0 - 0.0 
Amy 0.8.7b                  1.5 - 0.5 
Jonny 2.64                  2.0 - 0.0 

Rook Class (19 points out of 24 games = 79.2%)
__________
Comet B.68                  1.5 - 0.5 
Arasan 7.4                  1.5 - 0.5 
Terra 3.3B11                2.0 - 0.0 
Amateur 2.80                1.5 - 0.5 
The Crazy Bishop 0052       1.0 - 1.0 
Pepito v1.59                1.5 - 0.5 
Frenzee 159                 2.0 - 0.0 
PostModernist 1010a         2.0 - 0.0 
KnightDreamer 3.3           2.0 - 0.0 
The Baron 1.4.0 b2          1.5 - 0.5 
Naum 1.2                    0.5 - 1.5 
Pharaon 2.62                2.0 - 0.0 
and finally against the promoters
DanChess 1.0.6 DC           0.5 - 1.5 
Spike 0.6                   1.5 - 0.5 

The similar performance against Queen and Rook Class engines brought those closer together now in AEGT rating list and this ratings now already seem quite realistic, if you keep in mind that those finishing at the bottom of Rook Class are really considerably weaker than those in Queen Class.
What I am still not happy with is the difference in rating between King and Queen Class, because King Class engines are still too highly rated. I try to explain why:
We only could run four gauntlets before this one between King and Queen Class to give connections to ELOStat. Here the results were clearly in favour of King Class engines:
Pro Deo versus King = 15 points
Pro Deo versus Queen = 19.5 points
SOS versus King = 10.5 points
SOS versus Queen = 13.5 points
LambChop versus King = 8.5 points
LambChop versus Queen = 12 points
and especially
Little Goliath 1.0.0.14 against King = 9.5 points
Little Goliath 1.0.0.14 against Queen = 15 points
This leads to differences like this ones:
Quark 2.35                     : 2692   49  54   142    44.0 %   2734
Movei 00.8.247s                : 2574   44  56   150    48.3 %   2586 
After AEGT 1 we know that Quark is much stronger with long timecontrol than in Blitz (in fact for me it is the engine where this difference can be most clearly seen), but anyway it will not be 118 points stronger than Movei with this strong beta.
or:
Pharaon 3.00b                  : 2746   62  78    74    70.9 %   2591 
Pharaon 2.62                   : 2499   45  46   182    58.5 %   2439
Difference will be high, but not that high, AEGT 2 will show.
You cannot blame ELOStat in my opinion, you have to give such a program enough data for comparison to calculate correctly and number of games in gauntlets to compare is by far not sufficient up to now. So we will give more data to ELOStat by running another 8 gauntlets. In progress is AnMon 5.40 playing against the same engines now than Pharaon (results tomorrow).
One thing patient people should wait for is AEGT 2 where all the engines are mixed up and ratings will be finally realistic.

Rating AEGT after Pharaon 3.00b gauntlet September, 21st. 
  Program                          Elo    +   -   Games   Score   Av.Op.  Draws
  1 Pro Deo 1.0                    : 2821   79 118    48    71.9 %   2658   18.8 %
  2 Ruffian 1.0.5                  : 2814   49  53   142    62.7 %   2724   26.8 %
  3 Aristarch 4.50                 : 2804   50  52   142    61.3 %   2725   26.8 %
  4 Delfi 4.5                      : 2774   53  45   142    56.7 %   2727   33.1 %
  5 List 512                       : 2760   55  43   142    54.6 %   2729   34.5 %
  6 Thinker 4.6c                   : 2754   56  41   142    53.5 %   2729   38.0 %
  7 Pharaon 3.00b                  : 2746   62  78    74    70.9 %   2591   31.1 %
  8 Crafty 19.15                   : 2731   46  46   142    50.0 %   2731   39.4 %
  9 Tao 5.7 b04                    : 2722   46  58   142    48.6 %   2732   25.4 %
 10 SmarThink 0.17a                : 2711   42  56   142    46.8 %   2733   35.9 %
 11 Quark 2.35                     : 2692   49  54   142    44.0 %   2734   26.1 %
 12 AnMon 5.32                     : 2674   49  51   142    41.2 %   2736   28.9 %
 13 Yace 0.99.87                   : 2674   44  51   142    41.2 %   2736   37.3 %
 14 Gothmog 1.0 B10                : 2672   52  51   142    40.8 %   2736   23.9 %
 15 LG Revival 1.00.1.4            : 2665  103  68    48    51.0 %   2658   39.6 %
 16 GLC 3.00.3.4                   : 2664   48  44   150    62.0 %   2579   37.3 %
 17 SOS 4                          : 2658   87  87    48    50.0 %   2658   29.2 %
 18 ElChinito 3.25                 : 2631   51  46   148    57.4 %   2579   31.1 %
 19 WildCat 4                      : 2624   52  48   150    56.0 %   2582   25.3 %
 20 Fruit 1.5t                     : 2615   53  46   150    54.7 %   2583   26.7 %
 21 Patzer 3.61                    : 2613   89 100    48    61.5 %   2531   18.8 %
 22 Amyan 1.593b                   : 2609   54  45   150    53.7 %   2583   27.3 %
 23 LambChop 10.99                 : 2607   80  94    48    42.7 %   2658   31.2 %
 24 SlowChess 2.93a                : 2589   57  39   150    50.7 %   2585   37.3 %
 25 Movei 00.8.247s                : 2574   44  56   150    48.3 %   2586   27.3 %
 26 Jonny 2.64                     : 2564   46  54   150    46.7 %   2587   25.3 %
 27 Abrok 5.0                      : 2560   98  88    48    54.2 %   2531   20.8 %
 28 Amy 0.8.7b                     : 2559   46  54   150    46.0 %   2587   26.7 %
 29 Dragon 4.5 CF                  : 2551   41  53   150    44.7 %   2588   37.3 %
 30 The Baron 1.4.0 b2             : 2546   41  53   182    65.4 %   2436   22.0 %
 31 Pepito v1.59                   : 2534   42  51   182    63.7 %   2436   23.1 %
 32 King of Kings 2.56             : 2527   47  50   150    41.0 %   2590   30.0 %
 33 Naum 1.2                       : 2523   43  46   182    62.1 %   2437   27.5 %
 34 Comet B.68                     : 2508   44  48   182    59.9 %   2438   22.0 %
 35 Pharaon 2.62                   : 2499   45  46   182    58.5 %   2439   23.6 %
 36 Nejmet 3.07                    : 2488   82  95    48    43.8 %   2531   29.2 %
 37 Spike 0.6                      : 2484   3&
Heinz van Kempen
 

AEGT 2 deadline September 28th

Postby Heinz van Kempen » 21 Sep 2004, 07:46

Geschrieben von:/Posted by: Heinz van Kempen at 21 September 2004 08:46:57:
Als Antwort auf:/In reply to: AEGT Pharaon 3.00b against K, Q, R 40/40 geschrieben von:/posted by: Heinz van Kempen at 21 September 2004 08:44:16:

Hi all,
deadline for releasing or sending new versions to us for AEGT 2 is September 28th, because we need to test new versions first for stability etc..
Best Regards
AEGT group
Heinz van Kempen
 

Re: AEGT Pharaon 3.00b against K, Q, R 40/40

Postby Uri Blass » 21 Sep 2004, 11:02

Geschrieben von:/Posted by: Uri Blass at 21 September 2004 12:02:33:
Als Antwort auf:/In reply to: AEGT Pharaon 3.00b against K, Q, R 40/40 geschrieben von:/posted by: Heinz van Kempen at 21 September 2004 08:44:16:
Hi all ,
also in 40/40 games repeated (adapted to 2 Ghz CPU) the new Pharaon beta proves to be a monster and scored not less than 52.5 points out of 74 games against all engines from King, Queen and Rook Class and the two promoters DanChess and Spike. This gives the new Pharaon rank 7 in AEGT rating list and clearly King Class level.
Games are available at my side under downloads and King/Queen gauntlets.
http://www.husvankempen.de/nunn/
Here are the results



Pharaon against:
King Class (14.5 points out of 24 games = 60.4%)
__________
Ruffian 1.0.5               1.5 - 0.5
List 512                    1.0 - 1.0
Aristarch 4.50              1.0 - 1.0
Gothmog 1.0 B10             2.0 - 0.0
Thinker 4.6c                1.0 - 1.0
Crafty 19.15                1.0 - 1.0
SmarThink 0.17a             1.5 - 0.5
Tao 5.7 b04                 0.5 - 1.5
Quark 2.35                  1.0 - 1.0
Yace 0.99.87                1.0 - 1.0
Delfi 4.5                   1.0 - 1.0
AnMon 5.32                  2.0 - 0.0
Queen Class (17 points out of 22 games = 77.3%)
___________

SlowChess 2.93a             1.5 - 0.5
Movei 00.8.247s             1.5 - 0.5
Amyan 1.593b                1.5 - 0.5
Fruit 1.5t                  0.5 - 1.5
Dragon 4.5 CF               1.5 - 0.5
King of Kings 2.56          2.0 - 0.0
Ufim 5.01                   2.0 - 0.0
GLC 3.00.3.4                1.0 - 1.0
WildCat 4                   2.0 - 0.0
Amy 0.8.7b                  1.5 - 0.5
Jonny 2.64                  2.0 - 0.0
Rook Class (19 points out of 24 games = 79.2%)
__________
Comet B.68                  1.5 - 0.5
Arasan 7.4                  1.5 - 0.5
Terra 3.3B11                2.0 - 0.0
Amateur 2.80                1.5 - 0.5
The Crazy Bishop 0052       1.0 - 1.0
Pepito v1.59                1.5 - 0.5
Frenzee 159                 2.0 - 0.0
PostModernist 1010a         2.0 - 0.0
KnightDreamer 3.3           2.0 - 0.0
The Baron 1.4.0 b2          1.5 - 0.5
Naum 1.2                    0.5 - 1.5
Pharaon 2.62                2.0 - 0.0
and finally against the promoters
DanChess 1.0.6 DC           0.5 - 1.5
Spike 0.6                   1.5 - 0.5

The similar performance against Queen and Rook Class engines brought those closer together now in AEGT rating list and this ratings now already seem quite realistic, if you keep in mind that those finishing at the bottom of Rook Class are really considerably weaker than those in Queen Class.
What I am still not happy with is the difference in rating between King and Queen Class, because King Class engines are still too highly rated. I try to explain why:
We only could run four gauntlets before this one between King and Queen Class to give connections to ELOStat. Here the results were clearly in favour of King Class engines:
Pro Deo versus King = 15 points
Pro Deo versus Queen = 19.5 points
SOS versus King = 10.5 points
SOS versus Queen = 13.5 points
LambChop versus King = 8.5 points
LambChop versus Queen = 12 points
and especially
Little Goliath 1.0.0.14 against King = 9.5 points
Little Goliath 1.0.0.14 against Queen = 15 points
I think that the difference in elo based only on the results should be smaller.
Total results
60/96 against queen class
43.5/96 against king class
I use a simple linear formula that is used in Israel for performance when 100% means being 400 elo better.
In other words 96/96 against the queen class means being 400 elo better than the queen class and 0/96 means being 400 elo weaker than the queen class and everything in the middle is linear(it is not a good formula for extreme cases
but we have not here results that are close to 100%)
The difference between 96/96 and
48/96 differnece is 400 elo difference based on the formula.
The difference in elo between the classes should be 137.5 based on the results
because 400*(60-43.5)/48=137.5.
Let compare the king and queen class
King
1)Ruffian 2814
2)Aristarch 2804
3)Delfi 2774
4)List 2760
5)Thinker 2754
6)Crafty 2731
7)Tao 2722
8)Smarthink 2711
9)Quark 2692
10)Anmon 2674
11)Yace 2674
12)Gothmog 2672

Queen
1)GreenLight 2664(-150 relative to Ruffian)
2)Elchinito 2631(-173 relative ti arustarch)
3)WildCat 2624(-150)
4)Fruit 2615(-145)
5)Amyan 2609(-145)
6)Slowchess 2589(-142)
7)Movei 2574(-148)
8)Jonny 2564(-157)
9)Amy 2559(-133)
10)Dragon 2551(-123)
11)KingofKings 2527(-147)
12)Ufim 2464(-208)
The average difference is bigger than 137.5 and is slightly more than 150 elo.
I also think that pharaon2.62 is still too low in the rating list.
pharaon2.62 is probably at similiar level to movei and after all it is a premier division program when Movei is not a premier division program so Movei cannot be significantly better.
I guess that the rook class is still too low in the rating list.

Uri
Uri Blass
 

Re: AEGT Pharaon 3.00b against K, Q, R 40/40

Postby Heinz van Kempen » 21 Sep 2004, 12:45

Geschrieben von:/Posted by: Heinz van Kempen at 21 September 2004 13:45:40:
Als Antwort auf:/In reply to: Re: AEGT Pharaon 3.00b against K, Q, R 40/40 geschrieben von:/posted by: Uri Blass at 21 September 2004 12:02:33:

The average difference is bigger than 137.5 and is slightly more than 150 elo.
I also think that pharaon2.62 is still too low in the rating list.
pharaon2.62 is probably at similiar level to movei and after all it is a premier division program when Movei is not a premier division program so Movei cannot be significantly better.
I guess that the rook class is still too low in the rating list.

Uri
Hello Uri,
okay, you may be right, but as you see we have only one more gauntlet yet and ratings are already evolving in the right direction. We will run the other gauntlets. Each needs one day to be finished on four computers. So not a lot of patience is needed. I predict that after all the gauntlets ratings from Rook Class engines and Pharaon 2.62 will be higher compared to Queen Class and ratings from Pharaon 3.00b and other King Class engines will be lower. I also predict that our ratings after AEGT 2 will be comparable to other rating lists of importance with long timecontrol like Leo´s (if you keep in mind the usual statistical errors from all lists based on less than 1000 games per engine).
For Pharaon 2.62 there might be a statistical error based on 182 games only, what also is quite normal. No tournament will always deliver all the results as it would be the statistically most probable case.
Best Regards
Heinz
Heinz van Kempen
 

Re: AEGT 2 deadline September 28th

Postby Pallav Nawani » 21 Sep 2004, 19:03

Geschrieben von:/Posted by: Pallav Nawani at 21 September 2004 20:03:37:
Als Antwort auf:/In reply to: AEGT 2 deadline September 28th geschrieben von:/posted by: Heinz van Kempen at 21 September 2004 08:46:57:
Hi all,
deadline for releasing or sending new versions to us for AEGT 2 is September 28th, because we need to test new versions first for stability etc..
Best Regards
AEGT group
Hello,
Apart from the engines you have already shortlisted (K,Q,R,B,N classes), are you going to play newer engines also?
Regds,
Pallav
Pallav Nawani
 

answer to Pallav and AEGT 2 decisions so far

Postby Heinz van Kempen » 21 Sep 2004, 19:44

Geschrieben von:/Posted by: Heinz van Kempen at 21 September 2004 20:44:56:
Als Antwort auf:/In reply to: Re: AEGT 2 deadline September 28th geschrieben von:/posted by: Pallav Nawani at 21 September 2004 20:03:37:
Hello,
Apart from the engines you have already shortlisted (K,Q,R,B,N classes), are you going to play newer engines also?
Regds,
Pallav
Hello Pallav,
we ran 29 gauntlets with newer and a bit older engines to give them the chance to qualify for AEGT 2. On my site you will find all these gauntlets and results and who qualified.
Roger Brown is testing new engines that are released almost every day as it seems this weeks to have a preselection for a promotion tournament that will be run after AEGT 2 by some testers. Some engines that would fit in this promotion tournament are for example CyberPagno 2.1, the new Matacz, Gosu, Petir, Kiwi (all coming too late for AEGT 2) and those who only missed promotion last time by a narrow margin and those that will still be released in the next weeks.
Please give Roger a hint as soon as you think that Natwarlal is strong enough to be included in such a promotion tournament. But with this time control we will not be able to test all engines that will soon be 300.
To give more chances for the newer ones under development there are additional rules like taking out older versions. So I received complaints from Olivier and Ralf concerinig stability and time management for LambChop what was never bugfixed. I had a look at the exe-file from LambChop and saw that the exe is almost two years old. LambChop seems to be private under the name of Warp now, what might also be a totally new approach, and I doubt that the author will have any interest in games for his old LambChop in AEGT and this is another thing we want, to have engines in where the authors are interested in the games.
So LambChop will not play, it also did not play in AEGT 1 except for a gauntlet, as it received no votes. If there will not a storm of protest from other AEGT testers I will leave LambChop out. Better to have then the new strong Pharaon in the class it belongs to.
Those things are currently voted. I already received a lot of replies to my questions. All so far want that the rating list will be kept more precise with new gauntlets now for the upper classes, what is in progress currently. And then all want that the latest rating list before the start of AEGT 2 is taken to have in the strongest versions for the classes. So apart from Pharaon in King Class it might be that Arasan will qualify with the new version for Rook Class at least. Arasan 8.1 is the next gauntlet after AnMon 5.40 and Jonny 2.70.
Another vote where there is unity so far is that four engines will promote and demote from each class. For weaker engines there will be a promotion tournament for AEGT 3 (only run by a few testers). For stronger ones that are new released we will run a gauntlet and it then qualifies for the class corresponding to ELO gained if the engine that has to go down from this class additionally has a weaker rating.
For decisions above I will tell when majorities will change, what is already improbable.
More technical stuff like adudication of games by testers and GUI will be explained by Igor before AEGT 2 I think. We voted partly for such adjudications because we are losing a lot of time for more games, when engines do not resign and do not support tablebases. Igor also keeps the draft at his site up to date.

Best Regards
AEGT group
Heinz van Kempen
 

Pseudo !!!???

Postby Heinz van Kempen » 21 Sep 2004, 20:24

Geschrieben von:/Posted by: Heinz van Kempen at 21 September 2004 21:24:29:
Als Antwort auf:/In reply to: answer to Pallav and AEGT 2 decisions so far geschrieben von:/posted by: Heinz van Kempen at 21 September 2004 20:44:56:

Hi all ,
I just saw the test results from Leo and so we will also have to run a gauntlet for Pseudo. Hope the engine is not what one could understand by its name....

Best Regards
Heinz
Heinz van Kempen
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 63 guests