something is clearly wrong in AEGT rating list

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

something is clearly wrong in AEGT rating list

Postby Uri Blass » 19 Sep 2004, 09:53

Geschrieben von:/Posted by: Uri Blass at 19 September 2004 10:53:22:

http://www.husvankempen.de/nunn/aegtrating.htm
I see that Ufim that was last place in the queen class have better rating than all the programs of the rook class.
It is absurd copnsidering the fact that the queen class and the rook class are almost in the same level.
Movei has rating of 2630 when Pharaon 2.62 has only 2460
Another absurd considering the fact that old Pharaon is playing in the premier division of Leo when movei is not there.
I do not know how the rating was claculated but it seems to me that difference of near 200 elo between the average of the classes was assumed and it is better to assume difference of 100 elo between the average of the classes that is more realistic and the difference between the queen class and the rook class is less than 100 elo.
Here are some rating from WBEC
King class:
Aristarch 2658
Ruffian 2713
Delfi 2559
List ----
Thinker 2583
Crafty 2648
Smarthink 2612
Tao 2602
Quark 2555
Anmon 2441
Gothmog 2518
Yace 2530
avverage almost 2600
Queen class:
GreenLight 2554
Elchinito ----
Wildcat 2528
Fruit ----
Amyan 2555
Slowchess 2451
Movei 2538
Jonny ----
Amy 2429
Dragon 2503
Kingofkings2492
Ufim 2313
average almost 2500
rook class:
Baron 2425
Pepito 2494
Naum ----
Pharaon 2527
KnightDreamer2411
Comet 2472
Arasan 2386
Terra 2347
Postmodernist2404
Frenzee 2310
Amateur 2425
Crazybishop 2423
average>2400
bishop class:
DanChess 2444
spike 2455
snitch 2428
Bruja ----
Cerebro 2324
Boot 2468
Trace 2198
Djinn 2358
Harmann 2243
Alarm 2270
BlackBishop 2208
BigLion 2142
average>2300
Uri Blass
 

Re: something is clearly wrong in AEGT rating list

Postby Heinz van Kempen » 19 Sep 2004, 10:16

Geschrieben von:/Posted by: Heinz van Kempen at 19 September 2004 11:16:46:
Als Antwort auf:/In reply to: something is clearly wrong in AEGT rating list geschrieben von:/posted by: Uri Blass at 19 September 2004 10:53:22:
http://www.husvankempen.de/nunn/aegtrating.htm
I see that Ufim that was last place in the queen class have better rating than all the programs of the rook class.
It is absurd copnsidering the fact that the queen class and the rook class are almost in the same level.
Movei has rating of 2630 when Pharaon 2.62 has only 2460
Another absurd considering the fact that old Pharaon is playing in the premier division of Leo when movei is not there.
I do not know how the rating was claculated but it seems to me that difference of near 200 elo between the average of the classes was assumed and it is better to assume difference of 100 elo between the average of the classes that is more realistic and the difference between the queen class and the rook class is less than 100 elo.
Here are some rating from WBEC
King class:
Aristarch 2658
Ruffian 2713
Delfi 2559
List ----
Thinker 2583
Crafty 2648
Smarthink 2612
Tao 2602
Quark 2555
Anmon 2441
Gothmog 2518
Yace 2530
avverage almost 2600
Queen class:
GreenLight 2554
Elchinito ----
Wildcat 2528
Fruit ----
Amyan 2555
Slowchess 2451
Movei 2538
Jonny ----
Amy 2429
Dragon 2503
Kingofkings2492
Ufim 2313
average almost 2500
rook class:
Baron 2425
Pepito 2494
Naum ----
Pharaon 2527
KnightDreamer2411
Comet 2472
Arasan 2386
Terra 2347
Postmodernist2404
Frenzee 2310
Amateur 2425
Crazybishop 2423
average>2400
bishop class:
DanChess 2444
spike 2455
snitch 2428
Bruja ----
Cerebro 2324
Boot 2468
Trace 2198
Djinn 2358
Harmann 2243
Alarm 2270
BlackBishop 2208
BigLion 2142
average>2300
Hello Uri,
you might have noticed that I wrote (when giving the list) that the rating list will be more exact, when AEGT 2 is in progress and it is updated while a lot of versions remain unchanged. This first rating list can´t be exact because there are not enough connections between the upper classes. Although there were run a lot of gauntlets between Rook and Bishop Class, there were only very few between King/Queen Class and Queen/Rook Class. But in AEGT 2 all are mixed up a bit and so we will have better values. So a bit of patience is needed here.
Best Regards
Heinz
Heinz van Kempen
 

another Q/R gauntlet and effect on rating

Postby Heinz van Kempen » 19 Sep 2004, 11:24

Geschrieben von:/Posted by: Heinz van Kempen at 19 September 2004 12:24:17:
Als Antwort auf:/In reply to: something is clearly wrong in AEGT rating list geschrieben von:/posted by: Uri Blass at 19 September 2004 10:53:22:

Hi Uri,
another Queen/Rook Class gauntlet from Roger and me was finished yesterday. Roger could not give his games earlier because of this terrible hurricane that haunted Jamaika and he also had to run 40 moves in 80 minutes repeated what really must be a torture. You will see effect on rating of only one more gauntlet here.
Abrok scored totally 26 points out of 48 games, 13 points against Queen and Rook Class engines each.


Arasan 7.4                      1.0 - 1.0 
Pepito v1.59                    1.0 - 1.0 
The Crazy Bishop 0052           0.5 - 1.5 
Terra 3.3B11                    2.0 - 0.0 
Pharaon 2.62                    1.0 - 1.0 
Frenzee 159                     0.0 - 2.0 
PostModernist 1010a             1.0 - 1.0 
Amateur 2.80                    2.0 - 0.0 
Naum 1.2                        1.0 - 1.0 
The Baron 1.4.0 b2              1.5 - 0.5 
Comet B.68                      1.0 - 1.0 
KnightDreamer 3.3               1.0 - 1.0 

Amy 0.8.7                       2.0 - 0.0 
Amyan 1.593                     0.5 - 1.5 
El Chinito 3.25                 0.5 - 1.5 
Green Light Chess 3.00.3.4      1.0 - 1.0 
Jonny 2.64                      1.0 - 1.0 
King of Kings 2.56              1.0 - 1.0 
Ufim 5.01                       1.5 - 0.5 
WildCat 4.0                     1.0 - 1.0 
SlowChess 2.89b                 1.5 - 0.5 
Dragon 4.5                      0.5 - 1.5 
Movei 00.8.247s                 2.0 - 0.0 
Fruit 1.5t                      0.5 - 1.5 

And now the effect on the rating list (by the way like told before done with EloStat and a start ELO of 2500, what surely is too high):

 16 GLC 3.00.3.4                   : 2686   48  44   148    62.2 %   2600   37.8 %
 17 ElChinito 3.25                 : 2655   51  46   148    57.4 %   2603   31.1 %
 18 WildCat 4                      : 2650   52  48   148    56.8 %   2603   25.7 %
 19 LambChop 10.99                 : 2646   80  94    48    42.7 %   2697   31.2 %
 20 Fruit 1.5t                     : 2635   54  47   148    54.4 %   2604   26.4 %
 21 Amyan 1.593b                   : 2633   54  46   148    54.1 %   2604   27.0 %
 22 Patzer 3.61                    : 2614   89 100    48    61.5 %   2533   18.8 %
 23 SlowChess 2.93a                : 2613   57  39   148    51.0 %   2606   37.2 %
 24 Movei 00.8.247s                : 2598   44  57   148    48.6 %   2607   27.0 %
 25 Jonny 2.64                     : 2589   46  55   148    47.3 %   2608   25.7 %
 26 Amy 0.8.7b                     : 2582   46  54   148    46.3 %   2608   26.4 %
 27 Dragon 4.5 CF                  : 2574   41  53   148    44.9 %   2609   37.2 %
 28 Abrok 5.0                      : 2562   98  88    48    54.2 %   2533   20.8 %
 29 King of Kings 2.56             : 2552   47  51   148    41.6 %   2611   30.4 %
 30 The Baron 1.4.0 b2             : 2525   41  54   180    65.8 %   2411   21.7 %
 31 Pepito v1.59                   : 2513   42  51   180    64.2 %   2412   22.8 %
 32 Naum 1.2                       : 2498   43  47   180    61.9 %   2413   27.2 %
 33 Nejmet 3.07                    : 2489   82  95    48    43.8 %   2533   29.2 %
 34 Ufim 5.01                      : 2488   58  45   148    32.4 %   2616   25.7 %
 35 Comet B.68                     : 2486   44  49   180    60.3 %   2414   21.7 %
 36 Pharaon 2.62                   : 2479   45  47   180    59.2 %   2414   23.9 %
 37 Spike 0.6                      : 2459   39  62   172    72.4 %   2291   22.7 %
 38 KnightDreamer 3.3              : 2455   48  42   180    55.6 %   2416   27.8 %
 39 Terra 3.3B11                   : 2449   48  44   180    54.7 %   2416   23.9 %
 40 PostModernist 1010a            : 2442   49  40   180    53.6 %   2417   30.6 %
 41 Arasan 7.4                     : 2435   50  40   180    52.5 %   2417   28.3 %
 42 DanChess 1.0.6 DC              : 2429   41  58   172    68.6 %   2293   22.1 %
 43 Frenzee 159                    : 2409   41  51   180    48.6 %   2419   25.0 %
 44 The Crazy Bishop 0052          : 2402   42  50   180    47.5 %   2420   23.9 %
 45 Amateur 2.80                   : 2400   44  50   180    47.2 %   2420   21.1 %


As you can see the two losses from Movei versus Abrok alone cost Movei 32 rating points. Due to the similar performance of Queen and Rook Class engines against Abrok, ratings are now closer, Ufim for example dropped underneath The Baron, Pepito and Naum and Pharaon gained 19 points although only playing 1:1 against Abrok.
When AEGT 2 is in progress the situation will be better for rating calculation. Pepito for example will probably play with an unchanged version in Queen Class and engines going down to Rook Class like Amy and Ufim for example will give more valuable data for calculation. Same is valid for the other classes.
I thought it should go without saying that a new event like AEGT has to run for a few rounds to give better data.
The Abrok gauntlet is available for download.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
Heinz van Kempen
 

AEGT Bishop Class qualify - updated rating list

Postby Heinz van Kempen » 19 Sep 2004, 15:52

Geschrieben von:/Posted by: Heinz van Kempen at 19 September 2004 16:52:10:
Als Antwort auf:/In reply to: another Q/R gauntlet and effect on rating geschrieben von:/posted by: Heinz van Kempen at 19 September 2004 12:24:17:

Hi all ,
the Bishop Class qualifier between Muse 0.998 and Averno 0.70 ended 6:3 in favour of Muse. So Muse will play Bishop Class and Averno in Knight Class.
Games were 40/40 adapted to 2 Ghz and are included in the updated rating list, as well as the Queen/Rook Class gauntlet with Abrok 5.0.
With every double round robin in AEGT 2 the rating list will be more precise.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
Heinz van Kempen
 

and more gauntlets

Postby Heinz van Kempen » 19 Sep 2004, 16:12

Geschrieben von:/Posted by: Heinz van Kempen at 19 September 2004 17:12:51:
Als Antwort auf:/In reply to: AEGT Bishop Class qualify - updated rating list geschrieben von:/posted by: Heinz van Kempen at 19 September 2004 16:52:10:

Hi all,
to have better rating calculation between King, Queen and Rook Class I will play 48 games gauntlets until the start of AEGT 2 with the following engines that are excluded from AEGT 2:
Ktulu 4.2
Patriot 0.172 light
Gandalf 4.32
and between Queen and Rook Class another two with Fafis 1.5 and Francesca MAD 0.09.
Best Regards
Heinz
Heinz van Kempen
 

Re: AEGT Bishop Class qualify - updated rating list

Postby Uri Blass » 19 Sep 2004, 16:13

Geschrieben von:/Posted by: Uri Blass at 19 September 2004 17:13:25:
Als Antwort auf:/In reply to: AEGT Bishop Class qualify - updated rating list geschrieben von:/posted by: Heinz van Kempen at 19 September 2004 16:52:10:
Hi all ,
the Bishop Class qualifier between Muse 0.998 and Averno 0.70 ended 6:3 in favour of Muse. So Muse will play Bishop Class and Averno in Knight Class.
Games were 40/40 adapted to 2 Ghz and are included in the updated rating list, as well as the Queen/Rook Class gauntlet with Abrok 5.0.
With every double round robin in AEGT 2 the rating list will be more precise.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
Did Elostat use all the games?
I do not see big difference between the rook and the queen class
Abrok scored 13 against every class
Patzer:
14.5 against the rook class
15 against the queen class
Nejmet
12.5 against the rook class
8.5 against the queen class
Betsy
6.5 against the rook
4 against the queen
Total result against queen
13+15+8.5+4=40.5/96
Total result against rook
13+14.5+12.5+6.5=46.5/96
The difference between the classes that is suggested by the results is clearly less than 100 elo.
I do not understand how elostat can get different result and it is better never to use that stupid program.
I think that it is better even not to have rating list and not to support that stupid elo program by publishing results that program got by this program.
Uri
Uri Blass
 

Re: AEGT Bishop Class qualify - updated rating list

Postby Heinz van Kempen » 19 Sep 2004, 16:24

Geschrieben von:/Posted by: Heinz van Kempen at 19 September 2004 17:24:58:
Als Antwort auf:/In reply to: Re: AEGT Bishop Class qualify - updated rating list geschrieben von:/posted by: Uri Blass at 19 September 2004 17:13:25:
Hi all ,
the Bishop Class qualifier between Muse 0.998 and Averno 0.70 ended 6:3 in favour of Muse. So Muse will play Bishop Class and Averno in Knight Class.
Games were 40/40 adapted to 2 Ghz and are included in the updated rating list, as well as the Queen/Rook Class gauntlet with Abrok 5.0.
With every double round robin in AEGT 2 the rating list will be more precise.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
Did Elostat use all the games?
I do not see big difference between the rook and the queen class
Abrok scored 13 against every class
Patzer:
14.5 against the rook class
15 against the queen class
Nejmet
12.5 against the rook class
8.5 against the queen class
Betsy
6.5 against the rook
4 against the queen
Total result against queen
13+15+8.5+4=40.5/96
Total result against rook
13+14.5+12.5+6.5=46.5/96
The difference between the classes that is suggested by the results is clearly less than 100 elo.
I do not understand how elostat can get different result and it is better never to use that stupid program.
I think that it is better even not to have rating list and not to support that stupid elo program by publishing results that program got by this program.
Uri
Hello all,
I do not think that EloStat is a stupid program although there are weaknesses. A program can only deliver good results for rating when the data allows connections. We had four different classes in AEGT 1. Rook and Bishop Class are well connected by 20 gauntlets giving 480 additional games. King and Queen Class are not well connected because of lack of engines fitting in this gap and only four gauntlets could be run here, same for Queen and Rook Class, where we only had another 4 gauntlets with only 96 additional games. So what did you expect? Why not wait for AEGT 2 when there is finally the possibility to mix up all this engines, because of promotion and demotion? Or is it only to descredit an event that just is about to start?
Okay, any constructive comments from others concerning rating calculation and EloStat are appreciated. Maybe we can learn something from the rating experts here. But please keep in mind the nature of first AEGT and that there were not a lot of possibilities to combine the classes, except with only few gauntlets for the upper classes.
Best Regards
Heinz
Heinz van Kempen
 

Re: AEGT Bishop Class qualify - updated rating list

Postby Uri Blass » 19 Sep 2004, 16:55

Geschrieben von:/Posted by: Uri Blass at 19 September 2004 17:55:05:
Als Antwort auf:/In reply to: Re: AEGT Bishop Class qualify - updated rating list geschrieben von:/posted by: Heinz van Kempen at 19 September 2004 17:24:58:
Hi all ,
the Bishop Class qualifier between Muse 0.998 and Averno 0.70 ended 6:3 in favour of Muse. So Muse will play Bishop Class and Averno in Knight Class.
Games were 40/40 adapted to 2 Ghz and are included in the updated rating list, as well as the Queen/Rook Class gauntlet with Abrok 5.0.
With every double round robin in AEGT 2 the rating list will be more precise.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
Did Elostat use all the games?
I do not see big difference between the rook and the queen class
Abrok scored 13 against every class
Patzer:
14.5 against the rook class
15 against the queen class
Nejmet
12.5 against the rook class
8.5 against the queen class
Betsy
6.5 against the rook
4 against the queen
Total result against queen
13+15+8.5+4=40.5/96
Total result against rook
13+14.5+12.5+6.5=46.5/96
The difference between the classes that is suggested by the results is clearly less than 100 elo.
I do not understand how elostat can get different result and it is better never to use that stupid program.
I think that it is better even not to have rating list and not to support that stupid elo program by publishing results that program got by this program.
Uri
Hello all,
I do not think that EloStat is a stupid program although there are weaknesses. A program can only deliver good results for rating when the data allows connections. We had four different classes in AEGT 1.

I know there are not enough games but the error that I find is not a statistical error.
Based on the results the rating should be different.
I could easily write the mathematical part of calculating rating but the main problem for me is to write a program that simply get the results from pgn file.
If somebody has a program in C that can read pgn file and get the results from it in some array then I guess that I can continue it to a better program that analyze the results.
I also need some small function to calculate the expected result between programs based on the difference in rating and after it the task seems to be easy.
Uri
Uri Blass
 

Re: AEGT Bishop Class qualify - updated rating list

Postby Heinz van Kempen » 19 Sep 2004, 17:03

Geschrieben von:/Posted by: Heinz van Kempen at 19 September 2004 18:03:26:
Als Antwort auf:/In reply to: Re: AEGT Bishop Class qualify - updated rating list geschrieben von:/posted by: Uri Blass at 19 September 2004 17:55:05:

I know there are not enough games but the error that I find is not a statistical error.
Based on the results the rating should be different.
I could easily write the mathematical part of calculating rating but the main problem for me is to write a program that simply get the results from pgn file.
If somebody has a program in C that can read pgn file and get the results from it in some array then I guess that I can continue it to a better program that analyze the results.
I also need some small function to calculate the expected result between programs based on the difference in rating and after it the task seems to be easy.
Uri
Hello Uri,
I would like it when you can do that in some way and give us better values. Of course I am also not happy with the rating list so far, but based on my experiences with 70 000 games and more for rating calculation like for my Nunn Blitz rating list, it will be better with more games and connections.
Anyway it is of course not fair to compare this first small rating list with WBEC, because Leo is running his tournaments for years already with a lot of engine versions not changing over the past years and with promotion and demotion. I know that Leo is doing rating calculation in a different way. Would be interesting to know more about that.
Best Regards
Heinz
Heinz van Kempen
 

Re: AEGT Bishop Class qualify - updated rating list

Postby Uri Blass » 19 Sep 2004, 17:03

Geschrieben von:/Posted by: Uri Blass at 19 September 2004 18:03:28:
Als Antwort auf:/In reply to: Re: AEGT Bishop Class qualify - updated rating list geschrieben von:/posted by: Uri Blass at 19 September 2004 17:55:05:
Hi all ,
the Bishop Class qualifier between Muse 0.998 and Averno 0.70 ended 6:3 in favour of Muse. So Muse will play Bishop Class and Averno in Knight Class.
Games were 40/40 adapted to 2 Ghz and are included in the updated rating list, as well as the Queen/Rook Class gauntlet with Abrok 5.0.
With every double round robin in AEGT 2 the rating list will be more precise.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
Did Elostat use all the games?
I do not see big difference between the rook and the queen class
Abrok scored 13 against every class
Patzer:
14.5 against the rook class
15 against the queen class
Nejmet
12.5 against the rook class
8.5 against the queen class
Betsy
6.5 against the rook
4 against the queen
Total result against queen
13+15+8.5+4=40.5/96
Total result against rook
13+14.5+12.5+6.5=46.5/96
The difference between the classes that is suggested by the results is clearly less than 100 elo.
I do not understand how elostat can get different result and it is better never to use that stupid program.
I think that it is better even not to have rating list and not to support that stupid elo program by publishing results that program got by this program.
Uri
Hello all,
I do not think that EloStat is a stupid program although there are weaknesses. A program can only deliver good results for rating when the data allows connections. We had four different classes in AEGT 1.

I know there are not enough games but the error that I find is not a statistical error.
Based on the results the rating should be different.
I could easily write the mathematical part of calculating rating but the main problem for me is to write a program that simply get the results from pgn file.
If somebody has a program in C that can read pgn file and get the results from it in some array then I guess that I can continue it to a better program that analyze the results.
I also need some small function to calculate the expected result between programs based on the difference in rating and after it the task seems to be easy.
Uri
I can add that the arrays that I want can assume maximal number of 1000 programs when I need only (number of programs not more than 1000)
one 1000*1000 arrays:
int result[1000][1000]
Result[i][j]=results that program i got against j in half points
If i beated j 7.5-3.5 then A[i][j]=15 and A[j][i]=7
If you only give me program that calculate this data based on pgn and give me array expected_result[10000] that gives expected result for every difference in rating that is smaller than 10000 then I guess that it will be easy to continue it to a program that calculate rating correctly.
Uri
Uri Blass
 

Re: AEGT Bishop Class qualify - updated rating list

Postby Günther Simon » 19 Sep 2004, 17:16

Geschrieben von:/Posted by: Günther Simon at 19 September 2004 18:16:44:
Als Antwort auf:/In reply to: Re: AEGT Bishop Class qualify - updated rating list geschrieben von:/posted by: Heinz van Kempen at 19 September 2004 18:03:26:
I know there are not enough games but the error that I find is not a statistical error.
Based on the results the rating should be different.
I could easily write the mathematical part of calculating rating but the main problem for me is to write a program that simply get the results from pgn file.
If somebody has a program in C that can read pgn file and get the results from it in some array then I guess that I can continue it to a better program that analyze the results.
I also need some small function to calculate the expected result between programs based on the difference in rating and after it the task seems to be easy.
Uri
Hello Uri,
I would like it when you can do that in some way and give us better values. Of course I am also not happy with the rating list so far, but based on my experiences with 70 000 games and more for rating calculation like for my Nunn Blitz rating list, it will be better with more games and connections.
Anyway it is of course not fair to compare this first small rating list with WBEC, because Leo is running his tournaments for years already with a lot of engine versions not changing over the past years and with promotion and demotion. I know that Leo is doing rating calculation in a different way. Would be interesting to know more about that.
Best Regards
Heinz
I can add that 'connections' between pools are of course more important,
than a big number of games. If two pools have no connection its simply
unlogically to compare anything.
Best regards,
Günther
P.S. I hope Class A will be broadcasted again tomorrow evening ;)
Günther Simon
 

Re: AEGT Bishop Class qualify - updated rating list

Postby Uri Blass » 19 Sep 2004, 17:34

Geschrieben von:/Posted by: Uri Blass at 19 September 2004 18:34:11:
Als Antwort auf:/In reply to: Re: AEGT Bishop Class qualify - updated rating list geschrieben von:/posted by: Günther Simon at 19 September 2004 18:16:44:
I know there are not enough games but the error that I find is not a statistical error.
Based on the results the rating should be different.
I could easily write the mathematical part of calculating rating but the main problem for me is to write a program that simply get the results from pgn file.
If somebody has a program in C that can read pgn file and get the results from it in some array then I guess that I can continue it to a better program that analyze the results.
I also need some small function to calculate the expected result between programs based on the difference in rating and after it the task seems to be easy.
Uri
Hello Uri,
I would like it when you can do that in some way and give us better values. Of course I am also not happy with the rating list so far, but based on my experiences with 70 000 games and more for rating calculation like for my Nunn Blitz rating list, it will be better with more games and connections.
Anyway it is of course not fair to compare this first small rating list with WBEC, because Leo is running his tournaments for years already with a lot of engine versions not changing over the past years and with promotion and demotion. I know that Leo is doing rating calculation in a different way. Would be interesting to know more about that.
Best Regards
Heinz
I can add that 'connections' between pools are of course more important,
than a big number of games. If two pools have no connection its simply
unlogically to compare anything.
Of course connection between pools is important but no connection between pools can be detected by the program and a program should not give one rating list when there is no connection between pools but more than one rating list.
There is a problem what to do when the connection between pools is weak.
An extreme case is the case when there is only a single game between programs in team A and programs in team B and there is no indirect connections between them when both played against programs in team C.
If the game is not drawn it is impossible to evaluate team A relative to team B
(if the result happens again and again the difference will be infinite) and even if the game is a draw then there is a big statistical error in the assumption that the programs that drew are equal.
Not giving rating to programs that won all the games and mark them as too good to get rating and not giving rating to programs that lost all the games and mark them as too bad is trivial but it is not enough and even after repeating the process there still may be team A that always beated team B when there is no indirect connection between the teams(not in AEGT but a program should be general to analyze results of games).
Uri
Uri Blass
 

Re: AEGT Bishop Class qualify - updated rating list

Postby Günther Simon » 19 Sep 2004, 17:43

Geschrieben von:/Posted by: Günther Simon at 19 September 2004 18:43:18:
Als Antwort auf:/In reply to: Re: AEGT Bishop Class qualify - updated rating list geschrieben von:/posted by: Uri Blass at 19 September 2004 18:34:11:
I know there are not enough games but the error that I find is not a statistical error.
Based on the results the rating should be different.
I could easily write the mathematical part of calculating rating but the main problem for me is to write a program that simply get the results from pgn file.
If somebody has a program in C that can read pgn file and get the results from it in some array then I guess that I can continue it to a better program that analyze the results.
I also need some small function to calculate the expected result between programs based on the difference in rating and after it the task seems to be easy.
Uri
Hello Uri,
I would like it when you can do that in some way and give us better values. Of course I am also not happy with the rating list so far, but based on my experiences with 70 000 games and more for rating calculation like for my Nunn Blitz rating list, it will be better with more games and connections.
Anyway it is of course not fair to compare this first small rating list with WBEC, because Leo is running his tournaments for years already with a lot of engine versions not changing over the past years and with promotion and demotion. I know that Leo is doing rating calculation in a different way. Would be interesting to know more about that.
Best Regards
Heinz
I can add that 'connections' between pools are of course more important,
than a big number of games. If two pools have no connection its simply
unlogically to compare anything.
Of course connection between pools is important but no connection between pools can be detected by the program and a program should not give one rating list when there is no connection between pools but more than one rating list.
There is a problem what to do when the connection between pools is weak.
An extreme case is the case when there is only a single game between programs in team A and programs in team B and there is no indirect connections between them when both played against programs in team C.
If the game is not drawn it is impossible to evaluate team A relative to team B
(if the result happens again and again the difference will be infinite) and even if the game is a draw then there is a big statistical error in the assumption that the programs that drew are equal.
Not giving rating to programs that won all the games and mark them as too good to get rating and not giving rating to programs that lost all the games and mark them as too bad is trivial but it is not enough and even after repeating the process there still may be team A that always beated team B when there is no indirect connection between the teams(not in AEGT but a program should be general to analyze results of games).
Uri
Well, EloStat warns you if there is no connection between the pool!
The problem IMHO is that EloStat is good for a start rating list after
lots of games, but once you have a good starting list, one should use
another program for permanent updating of the rating list.
But if you just see (like I do) every list generated by EloStat as
a _current performance_ list, which should be seen as a single
event (no comparison to 'previous performances'), there is not much
wrong with it.
Regards,
Günther
Günther Simon
 

proposal for verifying

Postby Heinz van Kempen » 19 Sep 2004, 17:45

Geschrieben von:/Posted by: Heinz van Kempen at 19 September 2004 18:45:40:
Als Antwort auf:/In reply to: Re: AEGT Bishop Class qualify - updated rating list geschrieben von:/posted by: Uri Blass at 19 September 2004 18:34:11:

Hello Uri,
I propose the following. Interrupting my Nunn Blitz tournaments in progress and deliver connections between King, Queen and Rook Class by running 40/40 games in 48 games gauntlets between the classes until October 1st, using the engines I mentioned before, plus Queen/Rook gauntlets with Anaconda, Pharaon 3.00b, The Baron 1.4.1b, Spike 0.7 and whatever I already get for AEGT 2. We then will see if combining pools will give better data. If not we can forget EloStat :-).
Best Regards
Heinz
Heinz van Kempen
 

AEGT rating calculation experiment

Postby Heinz van Kempen » 19 Sep 2004, 21:51

Geschrieben von:/Posted by: Heinz van Kempen at 19 September 2004 22:51:37:
Als Antwort auf:/In reply to: proposal for verifying geschrieben von:/posted by: Heinz van Kempen at 19 September 2004 18:45:40:

Hi all ,
Ralf and I now started to run more gauntlets for better AEGT rating calculation on four fast computers.
First engine is Pharaon 3.00b running two games 40/40 repeated against all King Class, Queen Class, Rook Class engines and the best two from Bishop Class, Spike and DanChess. More gauntlets will follow till October 1st. Our aim is to demonstrate that rating will be more precise when having a lot of connections between "pools" of engines, what should be the case like Guenther Simon already wrote.
AEGT testers who have fun in and have the time to help here and who are able to deliver at least 24 games until our start date October 1st please write to me. New testers are of course also welcome and can also write to me.
http://www.husvankempen.de/nunn/
Best Regards
Heinz
Heinz van Kempen
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 70 guests