Probability and computer chess.
Posted: 27 Oct 2004, 13:49
Probability and computer chess.
If you take several engines of similar level and run several tournaments in
a row can you guess winners?
My assumption was that you cannot.
So I performed a small experiment. The six engines which are close by strengh
(of AEGT King Class) played two round robins in a row of 6.
Hardware is Celeron 567MHz 128MB, the shortest time control possible for
decent chess: 1 min + 3 sec per game (ie each game lasts for 4 minutes on
average).
Here are results. Each time there is a new winner! Only after 5 events the
winner repeats (Ruffian).
Conclusions are obvious and commonplace:
1) The engines are actually close by strengh.
2) Any engine can win the event held for similar engines if the number of
games is small.
Just to complete overall picture, I add assembled cross-table and ratings.
But how much games are enough? That's the question.
To be continued...
Igor
If you take several engines of similar level and run several tournaments in
a row can you guess winners?
My assumption was that you cannot.
So I performed a small experiment. The six engines which are close by strengh
(of AEGT King Class) played two round robins in a row of 6.
Hardware is Celeron 567MHz 128MB, the shortest time control possible for
decent chess: 1 min + 3 sec per game (ie each game lasts for 4 minutes on
average).
Here are results. Each time there is a new winner! Only after 5 events the
winner repeats (Ruffian).
- Code: Select all
Engine Score De Th Ru An Ph Pr S-B
1: Delfi 4.5 7,0/10 ?? 01 == 1= =1 11 30,25
2: Thinker 4.6c 6,5/10 10 ?? =1 == 11 == 29,75
3: Ruffian 1.0.5 5,5/10 == =0 ?? 1= 01 1= 25,00
4: AnMon 5.50 4,0/10 0= == 0= ?? 00 11 19,75
5: Pharaon 3.1 3,5/10 =0 00 10 11 ?? 00 17,00
6: Pro Deo 1.0 3,5/10 00 == 0= 00 11 ?? 16,25
Engine Score Th Ph De Ru Pr An S-B
1: Thinker 4.6c 7,0/10 ?? == 11 01 == 11 32,00
2: Pharaon 3.1 5,5/10 == ?? 00 1= 11 01 25,75
3: Delfi 4.5 5,0/10 00 11 ?? 10 =0 =1 23,50
4: Ruffian 1.0.5 4,5/10 10 0= 01 ?? 01 10 22,75
5: Pro Deo 1.0 4,0/10 == 00 =1 10 ?? 0= 21,00
6: AnMon 5.50 4,0/10 00 10 =0 01 1= ?? 18,50
Engine Score Ru Th De An Pr Ph S-B
1: Ruffian 1.0.5 7,5/10 ?? 01 11 0= 11 11 32,00
2: Thinker 4.6c 5,5/10 10 ?? == 11 1= 00 29,00
3: Delfi 4.5 5,5/10 00 == ?? 01 11 =1 22,25
4: AnMon 5.50 5,0/10 1= 00 10 ?? =0 11 23,75
5: Pro Deo 1.0 4,0/10 00 0= 00 =1 ?? 11 15,25
6: Pharaon 3.1 2,5/10 00 11 =0 00 00 ?? 13,75
Engine Score An Pr Ru Ph De Th S-B
1: AnMon 5.50 7,0/10 ?? 11 =0 =1 01 11 32,25
2: Pro Deo 1.0 6,5/10 00 ?? =1 =1 11 1= 26,25
3: Ruffian 1.0.5 5,5/10 =1 =0 ?? 00 1= 11 24,25
4: Pharaon 3.1 5,0/10 =0 =0 11 ?? 10 10 23,75
5: Delfi 4.5 3,0/10 10 00 0= 01 ?? 0= 16,25
6: Thinker 4.6c 3,0/10 00 0= 00 01 1= ?? 12,75
Engine Score Pr Ru Th Ph De An S-B
1: Pro Deo 1.0 6,5/10 ?? 10 1= 01 1= 1= 29,50
2: Ruffian 1.0.5 6,5/10 01 ?? =0 1= =1 11 28,00
3: Thinker 4.6c 5,5/10 0= =1 ?? =1 0= 1= 26,75
4: Pharaon 3.1 5,0/10 10 0= =0 ?? 11 01 22,50
5: Delfi 4.5 3,5/10 0= =0 1= 00 ?? 01 17,75
6: AnMon 5.50 3,0/10 0= 00 0= 10 10 ?? 14,50
Engine Score Ru Th An De Pr Ph S-B
1: Ruffian 1.0.5 7,0/10 ?? 10 1= 11 1= == 32,00
2: Thinker 4.6c 6,5/10 01 ?? 0= == 11 11 27,25
3: AnMon 5.50 5,5/10 0= 1= ?? 01 0= 11 25,50
4: Delfi 4.5 4,5/10 00 == 10 ?? 1= == 20,25
5: Pro Deo 1.0 3,5/10 0= 00 1= 0= ?? 01 17,00
6: Pharaon 3.1 3,0/10 == 00 00 == 10 ?? 15,00
Conclusions are obvious and commonplace:
1) The engines are actually close by strengh.
2) Any engine can win the event held for similar engines if the number of
games is small.
Just to complete overall picture, I add assembled cross-table and ratings.
- Code: Select all
2004.10.23 - 2004.10.25
Score 1 2 3 4 5 6
-------------------------------------------------------------------------------------------------------------
1: Ruffian 1.0.5 36.5 / 60 XXXXXXXXXXXX =0100111=010 1=100==1111= ==01111==111 1=0111=0011= 010=11001===
2: Thinker 4.6c 34.0 / 60 =1011000=101 XXXXXXXXXXXX ==1111001=0= 1011==1=0=== ====1=0=0=11 11==0001=111
3: AnMon 5.50 28.5 / 60 0=011==0000= ==0000110=1= XXXXXXXXXXXX 0==010011001 111==0110=0= 001011=11011
4: Delfi 4.5 28.5 / 60 ==10000==000 0100==0=1=== 1==101100110 XXXXXXXXXXXX 11=011000=1= =111=10100==
5: Pro Deo 1.0 28.0 / 60 0=1000=1100= ====0=1=1=00 000==1001=1= 00=100111=0= XXXXXXXXXXXX 110011=10101
6: Pharaon 3.1 24.5 / 60 101=00110=== 00==1110=000 110100=00100 =000=01011== 001100=01010 XXXXXXXXXXXX
-------------------------------------------------------------------------------------------------------------
180 games: +67 =52 -61
Program Elo + - Games Score Av.Op. Draws
1 Ruffian 1.0.5 : 2564 79 78 60 60.8 % 2487 28.3 %
2 Thinker 4.6c : 2539 84 66 60 56.7 % 2492 36.7 %
3 AnMon 5.50 : 2485 73 89 60 47.5 % 2503 25.0 %
4 Delfi 4.5 : 2485 68 89 60 47.5 % 2503 31.7 %
5 Pro Deo 1.0 : 2481 70 88 60 46.7 % 2504 30.0 %
6 Pharaon 3.1 : 2446 83 81 60 40.8 % 2511 21.7 %
But how much games are enough? That's the question.
To be continued...
Igor