My previous post deals with cross-tables and winners. Now we look at the
rating lists after each round robin event. I try to find out the minimal
number of games that are needed for proper rating calculation.
Conditions are the same:
"The six engines which are close by strength (of AEGT King Class) played two
round robins in a row. Hardware is Celeron 567MHz 128MB, the shortest
time control possible for decent chess: 1 min + 3 sec per game (ie each
game lasts for 4 minutes on average)."
Note the first column which is added by me and which shows changes in places
(plus means up, while minus means down.)
- Code: Select all
1st event (each event is 2-round robin with 60 games in total)
Program Elo + - Games Score Av.Op. Draws
1 Delfi 4.5 : 2622 235 175 10 70.0 % 2475 40.0 %
2 Thinker 4.6c : 2589 244 142 10 65.0 % 2482 50.0 %
3 Ruffian 1.0.5 : 2528 266 141 10 55.0 % 2494 50.0 %
4 AnMon 5.50 : 2441 168 255 10 40.0 % 2511 40.0 %
5 Pharaon 3.1 : 2410 279 244 10 35.0 % 2517 10.0 %
6 Pro Deo 1.0 : 2410 204 244 10 35.0 % 2517 30.0 %
2nd event
Chng Program Elo + - Games Score Av.Op. Draws
in pl
+1 1 Thinker 4.6c : 2605 141 110 20 67.5 % 2478 45.0 %
-1 2 Delfi 4.5 : 2558 153 133 20 60.0 % 2488 30.0 %
0 3 Ruffian 1.0.5 : 2500 141 141 20 50.0 % 2500 30.0 %
+1 4 Pharaon 3.1 : 2470 144 162 20 45.0 % 2505 20.0 %
-1 5 AnMon 5.50 : 2441 133 153 20 40.0 % 2511 30.0 %
0 6 Pro Deo 1.0 : 2426 126 149 20 37.5 % 2514 35.0 %
3rd event
Program Elo + - Games Score Av.Op. Draws
0 1 Thinker 4.6c : 2579 114 96 30 63.3 % 2484 40.0 %
0 2 Delfi 4.5 : 2549 122 105 30 58.3 % 2490 30.0 %
0 3 Ruffian 1.0.5 : 2549 122 115 30 58.3 % 2490 23.3 %
+1 4 AnMon 5.50 : 2461 108 124 30 43.3 % 2508 26.7 %
-1 5 Pharaon 3.1 : 2431 131 117 30 38.3 % 2514 16.7 %
0 6 Pro Deo 1.0 : 2431 109 117 30 38.3 % 2514 30.0 %
Those three events are usual shifting from pillar to post.
- Code: Select all
4th event
Program Elo + - Games Score Av.Op. Draws
+2 1 Ruffian 1.0.5 : 2544 104 96 40 57.5 % 2491 25.0 %
-1 2 Thinker 4.6c : 2529 107 82 40 55.0 % 2494 35.0 %
-1 3 Delfi 4.5 : 2507 113 87 40 51.2 % 2499 27.5 %
0 4 AnMon 5.50 : 2500 99 99 40 50.0 % 2500 25.0 %
+1 5 Pro Deo 1.0 : 2471 87 107 40 45.0 % 2506 30.0 %
-1 6 Pharaon 3.1 : 2449 107 102 40 41.2 % 2510 17.5 %
The first important moment. From now on (i.e. till the final event) three
engine take their constant places: 1,2,6.
- Code: Select all
5th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2553 90 86 50 59.0 % 2489 26.0 %
0 2 Thinker 4.6c : 2529 95 70 50 55.0 % 2494 38.0 %
+2 3 Pro Deo 1.0 : 2494 75 101 50 49.0 % 2501 30.0 %
-1 4 Delfi 4.5 : 2488 77 99 50 48.0 % 2502 28.0 %
-1 5 AnMon 5.50 : 2477 83 96 50 46.0 % 2505 24.0 %
0 6 Pharaon 3.1 : 2459 92 92 50 43.0 % 2508 18.0 %
6th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2564 79 78 60 60.8 % 2487 28.3 %
0 2 Thinker 4.6c : 2539 84 66 60 56.7 % 2492 36.7 %
+2 3 AnMon 5.50 : 2485 73 89 60 47.5 % 2503 25.0 %
0 4 Delfi 4.5 : 2485 68 89 60 47.5 % 2503 31.7 %
-2 5 Pro Deo 1.0 : 2481 70 88 60 46.7 % 2504 30.0 %
0 6 Pharaon 3.1 : 2446 83 81 60 40.8 % 2511 21.7 %
The second important moment: three engines tight for places 3-5. From now
on they will shift their places!
- Code: Select all
7th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2572 72 75 70 62.1 % 2485 27.1 %
0 2 Thinker 4.6c : 2525 80 59 70 54.3 % 2495 37.1 %
+2 3 Pro Deo 1.0 : 2487 63 83 70 47.9 % 2502 30.0 %
0 4 Delfi 4.5 : 2487 61 83 70 47.9 % 2502 32.9 %
-2 5 AnMon 5.50 : 2471 70 79 70 45.0 % 2506 24.3 %
0 6 Pharaon 3.1 : 2458 71 77 70 42.9 % 2508 25.7 %
8th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2555 69 68 80 59.4 % 2489 26.2 %
0 2 Thinker 4.6c : 2536 72 58 80 56.2 % 2493 35.0 %
0 3 Pro Deo 1.0 : 2489 60 77 80 48.1 % 2502 28.8 %
0 4 Delfi 4.5 : 2485 58 76 80 47.5 % 2503 32.5 %
0 5 AnMon 5.50 : 2478 64 75 80 46.2 % 2504 25.0 %
0 6 Pharaon 3.1 : 2456 67 71 80 42.5 % 2509 25.0 %
Hurray! Now we get absolute truth. All engines are on their right places
and don't want to change their positions. Nevertheless check it more...
- Code: Select all
9th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2555 65 63 90 59.4 % 2489 27.8 %
0 2 Thinker 4.6c : 2542 67 56 90 57.2 % 2491 34.4 %
+2 3 AnMon 5.50 : 2490 58 73 90 48.3 % 2502 25.6 %
-1 4 Pro Deo 1.0 : 2481 56 71 90 46.7 % 2504 31.1 %
-1 5 Delfi 4.5 : 2477 57 70 90 46.1 % 2504 30.0 %
0 6 Pharaon 3.1 : 2455 61 66 90 42.2 % 2509 28.9 %
Maybe it's not so absolute?
They (three other engines) continue their stupid dances ;-(
- Code: Select all
10th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2547 63 58 100 58.0 % 2491 28.0 %
0 2 Thinker 4.6c : 2520 67 52 100 53.5 % 2496 33.0 %
+1 3 Pro Deo 1.0 : 2494 52 70 100 49.0 % 2501 30.0 %
-1 4 AnMon 5.50 : 2494 55 70 100 49.0 % 2501 26.0 %
0 5 Delfi 4.5 : 2477 54 66 100 46.0 % 2505 30.0 %
0 6 Pharaon 3.1 : 2468 56 65 100 44.5 % 2506 29.0 %
11th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2540 61 56 110 56.8 % 2492 26.4 %
0 2 Thinker 4.6c : 2524 63 49 110 54.1 % 2495 33.6 %
+1 3 AnMon 5.50 : 2503 67 51 110 50.5 % 2499 26.4 %
+1 4 Delfi 4.5 : 2492 51 66 110 48.6 % 2502 28.2 %
-2 5 Pro Deo 1.0 : 2489 51 65 110 48.2 % 2502 29.1 %
0 6 Pharaon 3.1 : 2452 56 59 110 41.8 % 2510 27.3 %
12th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2532 59 51 120 55.4 % 2494 29.2 %
0 2 Thinker 4.6c : 2522 61 48 120 53.8 % 2496 32.5 %
+2 3 Pro Deo 1.0 : 2498 49 64 120 49.6 % 2500 27.5 %
-1 4 AnMon 5.50 : 2498 49 64 120 49.6 % 2500 27.5 %
-1 5 Delfi 4.5 : 2490 49 63 120 48.3 % 2502 28.3 %
0 6 Pharaon 3.1 : 2461 52 58 120 43.3 % 2508 28.3 %
Note that the rating lists say practically the same after 40-60 games and
after 120 games. That is
- the number one is Ruffian
- the number two is Thinker
- the number six is Pharaon
- the other three engines are very close and their differentiation needs
much more games (hundreds? thousands?)
Conclusions:
1) The minimal number of games for rough rating estimation is 40. Even
though it needs more tests with greater number of engines.
2) To differentiate between some engines/versions you need your whole life
(or more?)
Igor