How much is enough? (or Probability, part 2)
Posted: 28 Oct 2004, 15:11
How much is enough? (or Probability, part 2)
My previous post deals with cross-tables and winners. Now we look at the
rating lists after each round robin event. I try to find out the minimal
number of games that are needed for proper rating calculation.
Conditions are the same:
"The six engines which are close by strength (of AEGT King Class) played two
round robins in a row. Hardware is Celeron 567MHz 128MB, the shortest
time control possible for decent chess: 1 min + 3 sec per game (ie each
game lasts for 4 minutes on average)."
Note the first column which is added by me and which shows changes in places
(plus means up, while minus means down.)
Those three events are usual shifting from pillar to post.
The first important moment. From now on (i.e. till the final event) three
engine take their constant places: 1,2,6.
The second important moment: three engines tight for places 3-5. From now
on they will shift their places!
Hurray! Now we get absolute truth. All engines are on their right places
and don't want to change their positions. Nevertheless check it more...
Maybe it's not so absolute?
They (three other engines) continue their stupid dances ;-(
Note that the rating lists say practically the same after 40-60 games and
after 120 games. That is
- the number one is Ruffian
- the number two is Thinker
- the number six is Pharaon
- the other three engines are very close and their differentiation needs
much more games (hundreds? thousands?)
Conclusions:
1) The minimal number of games for rough rating estimation is 40. Even
though it needs more tests with greater number of engines.
2) To differentiate between some engines/versions you need your whole life
(or more?)
Igor
My previous post deals with cross-tables and winners. Now we look at the
rating lists after each round robin event. I try to find out the minimal
number of games that are needed for proper rating calculation.
Conditions are the same:
"The six engines which are close by strength (of AEGT King Class) played two
round robins in a row. Hardware is Celeron 567MHz 128MB, the shortest
time control possible for decent chess: 1 min + 3 sec per game (ie each
game lasts for 4 minutes on average)."
Note the first column which is added by me and which shows changes in places
(plus means up, while minus means down.)
- Code: Select all
1st event (each event is 2-round robin with 60 games in total)
Program Elo + - Games Score Av.Op. Draws
1 Delfi 4.5 : 2622 235 175 10 70.0 % 2475 40.0 %
2 Thinker 4.6c : 2589 244 142 10 65.0 % 2482 50.0 %
3 Ruffian 1.0.5 : 2528 266 141 10 55.0 % 2494 50.0 %
4 AnMon 5.50 : 2441 168 255 10 40.0 % 2511 40.0 %
5 Pharaon 3.1 : 2410 279 244 10 35.0 % 2517 10.0 %
6 Pro Deo 1.0 : 2410 204 244 10 35.0 % 2517 30.0 %
2nd event
Chng Program Elo + - Games Score Av.Op. Draws
in pl
+1 1 Thinker 4.6c : 2605 141 110 20 67.5 % 2478 45.0 %
-1 2 Delfi 4.5 : 2558 153 133 20 60.0 % 2488 30.0 %
0 3 Ruffian 1.0.5 : 2500 141 141 20 50.0 % 2500 30.0 %
+1 4 Pharaon 3.1 : 2470 144 162 20 45.0 % 2505 20.0 %
-1 5 AnMon 5.50 : 2441 133 153 20 40.0 % 2511 30.0 %
0 6 Pro Deo 1.0 : 2426 126 149 20 37.5 % 2514 35.0 %
3rd event
Program Elo + - Games Score Av.Op. Draws
0 1 Thinker 4.6c : 2579 114 96 30 63.3 % 2484 40.0 %
0 2 Delfi 4.5 : 2549 122 105 30 58.3 % 2490 30.0 %
0 3 Ruffian 1.0.5 : 2549 122 115 30 58.3 % 2490 23.3 %
+1 4 AnMon 5.50 : 2461 108 124 30 43.3 % 2508 26.7 %
-1 5 Pharaon 3.1 : 2431 131 117 30 38.3 % 2514 16.7 %
0 6 Pro Deo 1.0 : 2431 109 117 30 38.3 % 2514 30.0 %
Those three events are usual shifting from pillar to post.
- Code: Select all
4th event
Program Elo + - Games Score Av.Op. Draws
+2 1 Ruffian 1.0.5 : 2544 104 96 40 57.5 % 2491 25.0 %
-1 2 Thinker 4.6c : 2529 107 82 40 55.0 % 2494 35.0 %
-1 3 Delfi 4.5 : 2507 113 87 40 51.2 % 2499 27.5 %
0 4 AnMon 5.50 : 2500 99 99 40 50.0 % 2500 25.0 %
+1 5 Pro Deo 1.0 : 2471 87 107 40 45.0 % 2506 30.0 %
-1 6 Pharaon 3.1 : 2449 107 102 40 41.2 % 2510 17.5 %
The first important moment. From now on (i.e. till the final event) three
engine take their constant places: 1,2,6.
- Code: Select all
5th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2553 90 86 50 59.0 % 2489 26.0 %
0 2 Thinker 4.6c : 2529 95 70 50 55.0 % 2494 38.0 %
+2 3 Pro Deo 1.0 : 2494 75 101 50 49.0 % 2501 30.0 %
-1 4 Delfi 4.5 : 2488 77 99 50 48.0 % 2502 28.0 %
-1 5 AnMon 5.50 : 2477 83 96 50 46.0 % 2505 24.0 %
0 6 Pharaon 3.1 : 2459 92 92 50 43.0 % 2508 18.0 %
6th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2564 79 78 60 60.8 % 2487 28.3 %
0 2 Thinker 4.6c : 2539 84 66 60 56.7 % 2492 36.7 %
+2 3 AnMon 5.50 : 2485 73 89 60 47.5 % 2503 25.0 %
0 4 Delfi 4.5 : 2485 68 89 60 47.5 % 2503 31.7 %
-2 5 Pro Deo 1.0 : 2481 70 88 60 46.7 % 2504 30.0 %
0 6 Pharaon 3.1 : 2446 83 81 60 40.8 % 2511 21.7 %
The second important moment: three engines tight for places 3-5. From now
on they will shift their places!
- Code: Select all
7th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2572 72 75 70 62.1 % 2485 27.1 %
0 2 Thinker 4.6c : 2525 80 59 70 54.3 % 2495 37.1 %
+2 3 Pro Deo 1.0 : 2487 63 83 70 47.9 % 2502 30.0 %
0 4 Delfi 4.5 : 2487 61 83 70 47.9 % 2502 32.9 %
-2 5 AnMon 5.50 : 2471 70 79 70 45.0 % 2506 24.3 %
0 6 Pharaon 3.1 : 2458 71 77 70 42.9 % 2508 25.7 %
8th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2555 69 68 80 59.4 % 2489 26.2 %
0 2 Thinker 4.6c : 2536 72 58 80 56.2 % 2493 35.0 %
0 3 Pro Deo 1.0 : 2489 60 77 80 48.1 % 2502 28.8 %
0 4 Delfi 4.5 : 2485 58 76 80 47.5 % 2503 32.5 %
0 5 AnMon 5.50 : 2478 64 75 80 46.2 % 2504 25.0 %
0 6 Pharaon 3.1 : 2456 67 71 80 42.5 % 2509 25.0 %
Hurray! Now we get absolute truth. All engines are on their right places
and don't want to change their positions. Nevertheless check it more...
- Code: Select all
9th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2555 65 63 90 59.4 % 2489 27.8 %
0 2 Thinker 4.6c : 2542 67 56 90 57.2 % 2491 34.4 %
+2 3 AnMon 5.50 : 2490 58 73 90 48.3 % 2502 25.6 %
-1 4 Pro Deo 1.0 : 2481 56 71 90 46.7 % 2504 31.1 %
-1 5 Delfi 4.5 : 2477 57 70 90 46.1 % 2504 30.0 %
0 6 Pharaon 3.1 : 2455 61 66 90 42.2 % 2509 28.9 %
Maybe it's not so absolute?
They (three other engines) continue their stupid dances ;-(
- Code: Select all
10th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2547 63 58 100 58.0 % 2491 28.0 %
0 2 Thinker 4.6c : 2520 67 52 100 53.5 % 2496 33.0 %
+1 3 Pro Deo 1.0 : 2494 52 70 100 49.0 % 2501 30.0 %
-1 4 AnMon 5.50 : 2494 55 70 100 49.0 % 2501 26.0 %
0 5 Delfi 4.5 : 2477 54 66 100 46.0 % 2505 30.0 %
0 6 Pharaon 3.1 : 2468 56 65 100 44.5 % 2506 29.0 %
11th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2540 61 56 110 56.8 % 2492 26.4 %
0 2 Thinker 4.6c : 2524 63 49 110 54.1 % 2495 33.6 %
+1 3 AnMon 5.50 : 2503 67 51 110 50.5 % 2499 26.4 %
+1 4 Delfi 4.5 : 2492 51 66 110 48.6 % 2502 28.2 %
-2 5 Pro Deo 1.0 : 2489 51 65 110 48.2 % 2502 29.1 %
0 6 Pharaon 3.1 : 2452 56 59 110 41.8 % 2510 27.3 %
12th event
Program Elo + - Games Score Av.Op. Draws
0 1 Ruffian 1.0.5 : 2532 59 51 120 55.4 % 2494 29.2 %
0 2 Thinker 4.6c : 2522 61 48 120 53.8 % 2496 32.5 %
+2 3 Pro Deo 1.0 : 2498 49 64 120 49.6 % 2500 27.5 %
-1 4 AnMon 5.50 : 2498 49 64 120 49.6 % 2500 27.5 %
-1 5 Delfi 4.5 : 2490 49 63 120 48.3 % 2502 28.3 %
0 6 Pharaon 3.1 : 2461 52 58 120 43.3 % 2508 28.3 %
Note that the rating lists say practically the same after 40-60 games and
after 120 games. That is
- the number one is Ruffian
- the number two is Thinker
- the number six is Pharaon
- the other three engines are very close and their differentiation needs
much more games (hundreds? thousands?)
Conclusions:
1) The minimal number of games for rough rating estimation is 40. Even
though it needs more tests with greater number of engines.
2) To differentiate between some engines/versions you need your whole life
(or more?)
Igor