Winboard Forum

by **Igor Gorelikov** » 15 Sep 2004, 14:05

Geschrieben von:/Posted by: Igor Gorelikov at 15 September 2004 15:05:09:

New Infinite Loop-6, B14
------------------------
The 29th long event of New Long IL-6 is concluded, as well as the second run
(B) of NIL-6.
The third run (C) is held for engines with 10 games and their neighbours.
The current raing list (which is updated after each event) can be found at
http://www.digichess.gr/infiniteloop/nil/NIL6_rat.txt
Time control: 80'+3"
Celeron 1GHz 256MB; hash settings is up to 38MB in sum for each engine;
resign is set to 6; 3-men, 4-men, and few 5-men Nalimov's TBs.
Galis WBTM 0.5.0 is used to arrange the event.

Results
P3 1GHz 256MB, 2004.09.13 - 2004.09.14
1 2 3 4 5 6 S R B
------------------------------------------------
1 SOS.3 for Arena X = 1 1 1 0 3= 1 8.50
2 List 512 UCI = X = 1 0 1 3 2 7.00
3 Movei 00.8.251s 0 = X = 1 = 2= 3-4 5.50
4 Ruffian 1.0.1 0 0 = X 1 1 2= 3-4 4.75
5 Delfi 4.5 0 1 0 0 X 1 2 5 4.50
6 Tao 5.6 1 0 = 0 0 X 1= 6 4.75
------------------------------------------------
Games 15/15, +6 -5 =4
After-event ratings
-------------------
Change Place Program Elo + - Games Score Av.Op. Draws
in places
-1 2 List 512 UCI : 2703 218 275 10 80.0 % 2462 20.0 %
-7 9 Movei 00.8.251s : 2644 226 218 10 75.0 % 2453 30.0 %
-3 6 Delfi 4.5 : 2658 244 279 10 65.0 % 2550 10.0 %
+2 5 Ruffian 1.0.1 : 2690 41 61 159 71.1 % 2534 23.9 %
+1 7 Tao 5.6 : 2654 91 87 50 58.0 % 2598 24.0 %
+1 8 SOS.3 for Arena : 2652 58 59 106 61.3 % 2572 28.3 %
Next long event
---------------
New IL-6, C1
239 Tinker 4.47
240 DelphiMax 2.9 UCI
243 RDChess 3.19
244 AICE 0.70
245 Matheus 2.3
246 RDChess 3.23

One selected game
[Event "New IL-6, B14"]
[Site "P3 1GHz 256MB"]
[Date "2004.09.14"]
[Round "2"]
[White "SOS.3 for Arena"]
[Black "Ruffian 1.0.1"]
[Result "1-0"]
[TimeControl "4800+3"]
[PairNo ""]
1. e4 c5 2. Nf3 d6 3. Nc3 Nc6 4. d4 cxd4 5. Nxd4 Nf6 6. Bc4 e6 7. Be3 a6 8.
Qe2 Be7 9. Rd1 O-O 10. O-O Nxd4 11. Rxd4 Ng4 12. Qxg4 e5 13. Qf3 exd4 14.
Bxd4 Bg5 15. Nd5 Be6 16. c3 Rc8 17. Bb3 Bh6 18. Nf6+ Kh8 19. Qh5 a5 20. g4
a4 21. Bd5 a3 22. bxa3 Bxd5 23. exd5 Rc4 24. Qf5 g6 25. Qd3 Rxd4 26. Qxd4
Bg7 27. g5 h6 28. h4 Re8 29. Rb1 Re5 30. f4 Rf5 31. Rxb7 Bxf6 32. gxf6 Qxf6
33. Rb4 Qxd4+ 34. Rxd4 Rh5 35. a4 Rxh4 36. c4 Kg7 37. c5 dxc5 38. d6 Rg4+
39. Kh2 cxd4 40. d7 h5 41. d8=Q Rxf4 42. Kg3 Rg4+ 43. Kf3 d3 44. a5 Ra4 45.
Qxd3 Kg8 46. a3 h4 47. Qc3 g5 48. Qb4 Rxb4 49. axb4
{Black resigns} 1-0
All the games of the events reported today can be found at
http://www.digichess.gr/infiniteloop/games/last.zip
====================================================
For games and more info on the Infinite Loop apply to
http://www.digichess.gr/infiniteloop/home.php

by **Uri Blass** » 15 Sep 2004, 15:04

Geschrieben von:/Posted by: Uri Blass at 15 September 2004 16:04:54:
Als Antwort auf:/In reply to: New Infinite Loop-6, B14 geschrieben von:/posted by: Igor Gorelikov at 15 September 2004 15:05:09:

New Infinite Loop-6, B14
------------------------
The 29th long event of New Long IL-6 is concluded, as well as the second run
(B) of NIL-6.
The third run (C) is held for engines with 10 games and their neighbours.
The current raing list (which is updated after each event) can be found at
http://www.digichess.gr/infiniteloop/nil/NIL6_rat.txt
Time control: 80'+3"
Celeron 1GHz 256MB; hash settings is up to 38MB in sum for each engine;
resign is set to 6; 3-men, 4-men, and few 5-men Nalimov's TBs.
Galis WBTM 0.5.0 is used to arrange the event.

Results
P3 1GHz 256MB, 2004.09.13 - 2004.09.14
1 2 3 4 5 6 S R B
------------------------------------------------
1 SOS.3 for Arena X = 1 1 1 0 3= 1 8.50
2 List 512 UCI = X = 1 0 1 3 2 7.00
3 Movei 00.8.251s 0 = X = 1 = 2= 3-4 5.50
4 Ruffian 1.0.1 0 0 = X 1 1 2= 3-4 4.75
5 Delfi 4.5 0 1 0 0 X 1 2 5 4.50
6 Tao 5.6 1 0 = 0 0 X 1= 6 4.75
------------------------------------------------
Games 15/15, +6 -5 =4
After-event ratings
-------------------
Change Place Program Elo + - Games Score Av.Op. Draws
in places
-1 2 List 512 UCI : 2703 218 275 10 80.0 % 2462 20.0 %
-7 9 Movei 00.8.251s : 2644 226 218 10 75.0 % 2453 30.0 %
-3 6 Delfi 4.5 : 2658 244 279 10 65.0 % 2550 10.0 %

Movei had better rating than delfi4.5 before the tournament.
Movei scored in the tournament 1/2 point more than delfi and now delfi has better rating than movei.
I do not claim that movei is better than delfi and delfi was probably unlucky but these results prove that something is wrong in the system that calculate rating.
Uri

by **Igor Gorelikov** » 15 Sep 2004, 15:12

Geschrieben von:/Posted by: Igor Gorelikov at 15 September 2004 16:12:57:
Als Antwort auf:/In reply to: Re: New Infinite Loop-6, B14 geschrieben von:/posted by: Uri Blass at 15 September 2004 16:04:54:

Movei had better rating than delfi4.5 before the tournament.
Movei scored in the tournament 1/2 point more than delfi and now delfi has better rating than movei.
I do not claim that movei is better than delfi and delfi was probably unlucky but these results prove that something is wrong in the system that calculate rating.
Uri

ELOStat 1.2 is used for rating calculations.
Igor

by **Michael Yee** » 15 Sep 2004, 15:14

Geschrieben von:/Posted by: Michael Yee at 15 September 2004 16:14:32:
Als Antwort auf:/In reply to: Re: New Infinite Loop-6, B14 geschrieben von:/posted by: Uri Blass at 15 September 2004 16:04:54:

Movei had better rating than delfi4.5 before the tournament.
Movei scored in the tournament 1/2 point more than delfi and now delfi has better rating than movei.
I do not claim that movei is better than delfi and delfi was probably unlucky but these results prove that something is wrong in the system that calculate rating.
Uri

Here's a thought:
If the ratings are computed from the entire pool of games (e.g., with elostat) instead of incrementally, then maybe it could be the case that the set of opponents delfi defeated in the past performed better in the future than the set of opponents movei defeated in the past.
For example, perhaps
delfi beat engine x who had a 20% "score" at time T
movei beat engine y who had a 30% "score" at time T
then suppose just x and y play more games against others (but delfi and movei don't play at all) and at time T + delta
x now has a 40% "score"
y now has a 20% "score"
Then ratings of delfi and movei could change even though they didn't play.
Michael

by **Sven Schüle** » 15 Sep 2004, 15:43

Geschrieben von:/Posted by: Sven Schüle at 15 September 2004 16:43:23:
Als Antwort auf:/In reply to: Re: New Infinite Loop-6, B14 geschrieben von:/posted by: Michael Yee at 15 September 2004 16:14:32:

Movei had better rating than delfi4.5 before the tournament.
Movei scored in the tournament 1/2 point more than delfi and now delfi has better rating than movei.
I do not claim that movei is better than delfi and delfi was probably unlucky but these results prove that something is wrong in the system that calculate rating.
Uri
Here's a thought:
If the ratings are computed from the entire pool of games (e.g., with elostat) instead of incrementally, then maybe it could be the case that the set of opponents delfi defeated in the past performed better in the future than the set of opponents movei defeated in the past.
For example, perhaps
delfi beat engine x who had a 20% "score" at time T
movei beat engine y who had a 30% "score" at time T
then suppose just x and y play more games against others (but delfi and movei don't play at all) and at time T + delta
x now has a 40% "score"
y now has a 20% "score"
Then ratings of delfi and movei could change even though they didn't play.
Michael

I think Michael is right. While calculating ratings for human chessplayers is done best incrementally because of significantly changing playing strenght over time, calculation for chess engines is probably done best over the whole set of data - provided the following holds:
- An engine with some identification, say "Delfi 4.5", is unique and does never change.
- Learning has no big long-term influence.
This also means that each new engine version, even if only one line of code has changed, must get a completely new rating, also if different settings are used ("personality").
As far as I know, ELOSTAT is used to calculate new ratings based on a whole set of games, including those from the past. This means that new ratings are simply based on more information than before, but they are not mainly based on the newest tournament(s), in opposite to incremental rating where newer tournaments get a higher weight.
Cheers,
Sven

by **Uri Blass** » 16 Sep 2004, 07:36

Geschrieben von:/Posted by: Uri Blass at 16 September 2004 08:36:33:
Als Antwort auf:/In reply to: Re: New Infinite Loop-6, B14 geschrieben von:/posted by: Michael Yee at 15 September 2004 16:14:32:

Movei had better rating than delfi4.5 before the tournament.
Movei scored in the tournament 1/2 point more than delfi and now delfi has better rating than movei.
I do not claim that movei is better than delfi and delfi was probably unlucky but these results prove that something is wrong in the system that calculate rating.
Uri
Here's a thought:
If the ratings are computed from the entire pool of games (e.g., with elostat) instead of incrementally, then maybe it could be the case that the set of opponents delfi defeated in the past performed better in the future than the set of opponents movei defeated in the past.
For example, perhaps
delfi beat engine x who had a 20% "score" at time T
movei beat engine y who had a 30% "score" at time T
then suppose just x and y play more games against others (but delfi and movei don't play at all) and at time T + delta
x now has a 40% "score"
y now has a 20% "score"
Then ratings of delfi and movei could change even though they didn't play.
Michael

This cannot be the explanation.
I guess that movei simply suffers from beating weak opponents thanks to the strange way that elostat calculate rating.
Movei scored 100% before the games of this tournament.
Movei scored exactly 50% in this tournament but movei has rating that is lower than all of the opponent in this tournament.
I guess that if you delete movei 5/5 from the previous tournament then movei can get a bigger rating.
Uri

by **Günther Simon** » 16 Sep 2004, 07:44

Geschrieben von:/Posted by: Günther Simon at 16 September 2004 08:44:15:
Als Antwort auf:/In reply to: Re: New Infinite Loop-6, B14 geschrieben von:/posted by: Uri Blass at 16 September 2004 08:36:33:

Movei had better rating than delfi4.5 before the tournament.
Movei scored in the tournament 1/2 point more than delfi and now delfi has better rating than movei.
I do not claim that movei is better than delfi and delfi was probably unlucky but these results prove that something is wrong in the system that calculate rating.
Uri
Here's a thought:
If the ratings are computed from the entire pool of games (e.g., with elostat) instead of incrementally, then maybe it could be the case that the set of opponents delfi defeated in the past performed better in the future than the set of opponents movei defeated in the past.
For example, perhaps
delfi beat engine x who had a 20% "score" at time T
movei beat engine y who had a 30% "score" at time T
then suppose just x and y play more games against others (but delfi and movei don't play at all) and at time T + delta
x now has a 40% "score"
y now has a 20% "score"
Then ratings of delfi and movei could change even though they didn't play.
Michael
This cannot be the explanation.
I guess that movei simply suffers from beating weak opponents thanks to the strange way that elostat calculate rating.
Movei scored 100% before the games of this tournament.
Movei scored exactly 50% in this tournament but movei has rating that is lower than all of the opponent in this tournament.
I guess that if you delete movei 5/5 from the previous tournament then movei can get a bigger rating.
Uri

With EloStat there is nothing like 'previous' rating.
It always calculates all data for the whole pool and trying to compare
calculations with less data (what you call 'previous') is simply wrong.
Regards,
Günther

by **Uri Blass** » 16 Sep 2004, 08:01

Geschrieben von:/Posted by: Uri Blass at 16 September 2004 09:01:16:
Als Antwort auf:/In reply to: Re: New Infinite Loop-6, B14 geschrieben von:/posted by: Günther Simon at 16 September 2004 08:44:15:

Movei had better rating than delfi4.5 before the tournament.
Movei scored in the tournament 1/2 point more than delfi and now delfi has better rating than movei.
I do not claim that movei is better than delfi and delfi was probably unlucky but these results prove that something is wrong in the system that calculate rating.
Uri
Here's a thought:
If the ratings are computed from the entire pool of games (e.g., with elostat) instead of incrementally, then maybe it could be the case that the set of opponents delfi defeated in the past performed better in the future than the set of opponents movei defeated in the past.
For example, perhaps
delfi beat engine x who had a 20% "score" at time T
movei beat engine y who had a 30% "score" at time T
then suppose just x and y play more games against others (but delfi and movei don't play at all) and at time T + delta
x now has a 40% "score"
y now has a 20% "score"
Then ratings of delfi and movei could change even though they didn't play.
Michael
This cannot be the explanation.
I guess that movei simply suffers from beating weak opponents thanks to the strange way that elostat calculate rating.
Movei scored 100% before the games of this tournament.
Movei scored exactly 50% in this tournament but movei has rating that is lower than all of the opponent in this tournament.
I guess that if you delete movei 5/5 from the previous tournament then movei can get a bigger rating.
Uri
With EloStat there is nothing like 'previous' rating.
It always calculates all data for the whole pool and trying to compare
calculations with less data (what you call 'previous') is simply wrong.
Regards,
Günther

I do not ask for calculation with less data.
Movei scored 100% against set A of weak engines and 50% against set B of strong engines.
Movei's rating is smaller than every engine in set B
It is simply illogical.
Movei's rating should be bigger or equal to the average rating of the engines in set B.
Without the games against set A it could be exactly the average of the rating of the engines in set B.
EloStat simply does not know to use all the data to calculate rating.
A program that calculate rating should not allow a situation that engine can earn rating by deleting wins.
I do not know the way that elostat calculates rating.
The correct way to calculate rating is to assume that the result happens again and again and to calculate rating after every tournament when program never lose rating from wins because the correct formula does not allow losing rating from wins.
The correct formula is to calculate the expected result of every game based on the difference in rating and the expected result can be never more than 1 point.
After calculating the expected results the program that did better than the expected result earn rating and the programs that did worse than the expected result lose rating and this process repeat again and again.

Before calculating ratings
program that won all the games should not be included in the rating and the same for program that lost all the games.
After removing these program this process should be repeated until there are no programs to remove.
This need to be done because otherwise repeating the same result again and again may lead to infinite rating and we need the rating to converge to something to have something that is not meaningless.
Uri

by **Sven Schüle** » 16 Sep 2004, 16:11

Geschrieben von:/Posted by: Sven Schüle at 16 September 2004 17:11:17:
Als Antwort auf:/In reply to: Re: New Infinite Loop-6, B14 geschrieben von:/posted by: Uri Blass at 16 September 2004 09:01:16:

I do not ask for calculation with less data.
Movei scored 100% against set A of weak engines and 50% against set B of strong engines.
Movei's rating is smaller than every engine in set B
It is simply illogical.
Movei's rating should be bigger or equal to the average rating of the engines in set B.

Uri,
now I think I have understood the problem, and probably you're right.
The main thing seems to be that ELOSTAT uses average ratings of the
opponents. The more accurate but much more costly way is to use each
single game for rating calculation.
I want to show this at the concrete example of Movei and Delfi.
I repeat the new ratings as listed in Igor's original posting:

After-event ratings
-------------------
Change Place Program Elo + - Games Score Av.Op. Draws
in places
-1 2 List 512 UCI : 2703 218 275 10 80.0 % 2462 20.0 %
-7 9 Movei 00.8.251s : 2644 226 218 10 75.0 % 2453 30.0 %
-3 6 Delfi 4.5 : 2658 244 279 10 65.0 % 2550 10.0 %
+2 5 Ruffian 1.0.1 : 2690 41 61 159 71.1 % 2534 23.9 %
+1 7 Tao 5.6 : 2654 91 87 50 58.0 % 2598 24.0 %
+1 8 SOS.3 for Arena : 2652 58 59 106 61.3 % 2572 28.3 %

Using the "average" approach, this means:
Movei's opponents in 10 games have an average current rating of 2453
(5x about 2671 and 5x about 2235). Movei has scored 75% against them.
Movei gets a rating of about 200 ELO points more than the opponents
(about 76% = exactly 200 points).
Delfi's opponents in 10 games have an average current rating of 2550
(5x about 2669 and 5x about 2431), and Delfi scored 65% against them.
Delfi gets a rating of about 100 ELO points more than the opponents
(about 64% = exactly 100 points).
The more exact way would be to go through several iterations, stopping when
all intermediate ratings are stable compared to the previous iteration,
and would be based on the sum of all expected scores against each individual
opponent.
Let's assume the new ratings listed above were such intermediate ratings
from some iteration N. Then, for Movei we would get something like this
(note that I do not know Movei's first tournament opponents;
note also that the "expected score" is a bit inexact here):

Program: Movei 00.8.251s (2644)
OppNo OppName OppRating NGames ExpScore RealScore
1 N.N. 2235 1 0,91 (?) 1,0
2 N.N. 2235 1 0,91 (?) 1,0
3 N.N. 2235 1 0,91 (?) 1,0
4 N.N. 2235 1 0,91 (?) 1,0
5 N.N. 2235 1 0,91 (?) 1,0
6 List 512 UCI 2703 1 0,42 (?) 0,5
7 Delfi 4.5 2658 1 0,49 (?) 1,0
8 Ruffian 1.0.1 2690 1 0,44 (?) 0,5
9 Tao 5.6 2654 1 0,49 (?) 0,5
10 SOS.3 for Arena 2652 1 0,50 (?) 0,0
=================================================================
total 10 6,79 7,5
67,9% 75,0%

RealScore - ExpScore = 75,0% - 67,9% = 7,1%, so the next intermediate
rating for Movei in iteration N+1 would be somewhat higher than 2644
(all based on a Gauss curve), perhaps about 2700 /?).
The same would of course be applied for all other programs in the whole
pool, all would get an (N+1)st intermediate rating, and this continues
until stability is reached.
This approach is used in the german national rating system for human
chessplayers ("DWZ - Deutsche Wertungszahl") to get a first rating for
unrated players. (Of course there are some differences in detail.)
And this is what we have here: we assume to have one big tournament
of unrated players.
I do not really know what ELOSTAT does but I suspect it works with
such iterations, too (assigning an initial value for all engines in
the first iteration, say 2200), but then uses averaging. This approach
may cause unexpected ratings especially for programs with very few games
(like in this case!), and/or in cases with great rating differences of
opponents (also like in this case).
The effect will probably go away with more games,
and as you might see in the rating list, there is still a high variance
given for those engines with only 10 games, e.g. for Movei +226/-218.
Does someone know more about how ELOSTAT really works?
Cheers,
Sven

by **Igor Gorelikov** » 17 Sep 2004, 11:07

Geschrieben von:/Posted by: Igor Gorelikov at 17 September 2004 12:07:27:
Als Antwort auf:/In reply to: Rating calculation problems (long) - Re: New Infinite Loop-6, B14 geschrieben von:/posted by: Sven Schüle at 16 September 2004 17:11:17:

I do not really know what ELOSTAT does but I suspect it works with
such iterations, too (assigning an initial value for all engines in
the first iteration, say 2200), but then uses averaging. This approach
may cause unexpected ratings especially for programs with very few games
(like in this case!), and/or in cases with great rating differences of
opponents (also like in this case).
The effect will probably go away with more games,
and as you might see in the rating list, there is still a high variance
given for those engines with only 10 games, e.g. for Movei +226/-218.
Does someone know more about how ELOSTAT really works?
Cheers,
Sven

Elostat works as you described it. An initial value is entered by the user.
Igor

Winboard Forum

New Infinite Loop-6, B14

New Infinite Loop-6, B14

Re: New Infinite Loop-6, B14

Re: New Infinite Loop-6, B14

Re: New Infinite Loop-6, B14

Re: New Infinite Loop-6, B14

Re: New Infinite Loop-6, B14

Re: New Infinite Loop-6, B14

Re: New Infinite Loop-6, B14

Rating calculation problems (long) - Re: New Infinite Loop-6

Re: Rating calculation problems (long)

Who is online