Comparing EloStat 1.1b, 1.3 and bayeselo ratings using YABRL
Posted: 22 Jan 2005, 19:02
The following are the ratings and error margins as caluclated by the 3 tools (Elostat 1.1b, Elostat 1.3 and bayeselo).
Elostat 1.1b (currently used for YABRL):
Error margins are often assymmetric and grow larger at the extremities of the table.
Elostat 1.3: Error margins are now symmetric and sometimes a bit smaller, in particular also at the extremities of the list. The ratings themselves are identical to the ones calculated with 1.1b.
Bayeselo: The ratings themselves differ from Elostat: the difference between the ratings of the first and last engine is bigger; at various occasions even the order is different (e.g. with bayeselo Smarthink is ahead of Thinker, with Elostat Thinker is ahead). Error margins look pretty much comparable to the ones calculated by Elostat 1.3, but a little larger.
Btw: I have calibrated the bayeselo ratings to Lambchop = 2497 using Excel. Anybody knows how one could tell bayeselo to use a start ELO value?
Robert
Elostat 1.1b (currently used for YABRL):
Error margins are often assymmetric and grow larger at the extremities of the table.
- Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 Chess Tiger 15.0 normal : 2721 14 22 1191 72.2 % 2555 28.0 %
2 Chess Tiger 2004 normal : 2714 16 26 911 73.1 % 2540 27.0 %
3 Gandalf v6.0WB : 2691 16 26 997 70.9 % 2536 21.9 %
4 Ruffian v2.1.0 : 2678 14 20 1358 68.0 % 2547 26.1 %
5 Ruffian v2.0.0 : 2675 17 25 900 70.1 % 2526 27.3 %
6 Ruffian v2.0.2dvbk : 2673 17 24 894 68.4 % 2539 27.4 %
7 List v5.12 : 2670 15 20 1196 66.4 % 2552 28.3 %
8 Pro Deo v1.0 11.Aug. : 2657 17 23 986 65.6 % 2545 22.3 %
9 Ruffian v1.0.1 : 2650 16 22 1015 67.7 % 2521 27.9 %
10 DeepSjeng v1.6ntb : 2620 16 17 1335 60.5 % 2546 26.4 %
11 Rebel v12.00.01 : 2619 20 21 908 59.2 % 2554 25.4 %
12 Aristarch v4.50 : 2617 16 17 1297 60.1 % 2545 25.9 %
13 Ktulu v5.1 : 2617 17 17 1206 60.7 % 2541 29.8 %
14 Thinker v4.6b : 2610 17 16 1230 59.7 % 2542 31.3 %
15 SmarThink v0.17a : 2607 16 16 1419 58.9 % 2545 26.4 %
16 Delfi v4.5 : 2598 18 18 1119 58.6 % 2537 27.7 %
17 Patriot v1.2.3cpbk : 2594 17 16 1297 56.9 % 2546 25.6 %
18 Ktulu v4.2 : 2587 19 18 1086 57.5 % 2535 24.7 %
19 Crafty v17.14DC : 2586 16 13 1575 56.7 % 2540 33.8 %
20 Thinker v4.5b : 2584 19 16 1055 56.4 % 2539 34.5 %
21 Ktulu v5.0 : 2583 20 21 943 56.5 % 2537 20.1 %
22 Aristarch v4.37 : 2582 20 17 959 56.7 % 2536 36.4 %
23 Crafty v19.06DCntb : 2582 18 16 1149 56.1 % 2539 30.9 %
24 Aristarch v4.21 : 2581 18 17 1177 57.2 % 2531 25.2 %
25 Yace v0.99.87 : 2581 19 18 1099 56.3 % 2537 25.9 %
26 Crafty v19.13RA : 2578 18 16 1201 55.1 % 2543 29.1 %
27 El Chinito v3.25 : 2568 20 17 1080 54.3 % 2538 27.8 %
28 Crafty-MPC v18.15DC : 2566 17 14 1516 53.6 % 2541 26.8 %
29 SlowChess Blitz v0.4 : 2565 21 20 857 55.8 % 2525 25.9 %
30 Wildcat v4.0 : 2565 19 16 1199 53.5 % 2540 27.1 %
31 Delfi v4.2 : 2562 25 25 580 58.1 % 2505 27.2 %
32 SoS 3 : 2562 16 14 1619 53.1 % 2540 24.7 %
33 Delfi v4.3 : 2561 18 16 1260 53.5 % 2537 24.9 %
34 SoS 4 : 2561 18 15 1392 52.4 % 2544 24.6 %
35 Gothmog v1.0 beta 10 : 2561 20 17 1099 53.3 % 2538 23.2 %
36 SmarThink v0.16b++ : 2560 21 21 836 58.0 % 2504 24.3 %
37 Little Goliath 2000 v3.9 : 2558 16 14 1618 52.6 % 2540 26.7 %
38 Crafty v18.15DC : 2557 22 22 741 59.0 % 2494 29.0 %
39 Pharaon v3.1 : 2555 21 18 999 52.3 % 2539 24.8 %
40 Yace Paderborn : 2554 17 14 1460 52.8 % 2535 26.9 %
41 Pepito v1.59 profile : 2553 17 14 1618 51.8 % 2540 23.8 %
42 SlowChess v2.94 : 2552 21 18 948 53.9 % 2525 25.4 %
43 Fruit v1.5 : 2551 21 16 1118 50.4 % 2549 25.2 %
44 Aristarch v4.4 : 2549 36 34 319 54.1 % 2520 20.4 %
45 Green Light Chess v3.0.3.4 : 2545 21 16 1095 51.0 % 2538 25.3 %
46 Anmon v5.51 : 2543 21 17 1058 50.9 % 2537 22.7 %
47 Yace v0.99.56 : 2543 33 29 380 54.3 % 2513 26.6 %
48 Little Goliath 2000 v3.5 : 2539 31 25 440 53.6 % 2513 30.9 %
49 Green Light Chess v3.00 : 2536 18 14 1479 50.0 % 2536 27.0 %
50 Glaurung v0.1.5 : 2532 18 22 898 48.2 % 2545 27.2 %
51 Baron v1.5.0 : 2530 18 22 971 48.7 % 2539 25.3 %
52 Anmon v5.30 : 2530 15 19 1259 48.6 % 2539 26.1 %
53 Tao v5.6 : 2523 16 18 1331 46.5 % 2547 24.3 %
54 Jonny v2.70 : 2522 19 21 977 46.8 % 2544 19.4 %
55 Pharaon v2.62 : 2510 15 18 1379 47.4 % 2529 24.2 %
56 Amyan v1.59 : 2510 15 18 1348 46.8 % 2532 26.7 %
57 Quark v2.35 : 2500 18 19 1092 44.5 % 2539 25.3 %
58 Crafty v19.01DC : 2500 24 19 815 50.4 % 2497 25.5 %
59 Gromit v3.8.2 : 2500 15 16 1589 44.1 % 2541 23.9 %
60 LambChop v10.99 : 2497 15 15 1617 43.7 % 2541 22.1 %
61 Ktulu v3.9 : 2492 19 24 779 48.6 % 2502 26.1 %
62 SlowChess v2.89b : 2488 17 18 1253 43.9 % 2530 25.0 %
63 Anmon v5.22 : 2485 19 22 899 46.4 % 2510 26.7 %
64 Comet B44-2 : 2479 16 15 1538 41.0 % 2542 25.0 %
65 SoS v11-99 : 2479 33 34 359 46.0 % 2507 17.3 %
66 Tao v5.4 : 2477 19 19 1039 44.1 % 2518 21.0 %
67 Amy v0.8.3 : 2476 17 15 1560 41.0 % 2539 18.2 %
68 KnightDreamer v3.2 : 2475 16 15 1520 40.5 % 2542 24.2 %
69 Dragon v4.4.3 : 2465 17 15 1466 39.2 % 2541 25.0 %
70 Comet B62-3 : 2464 17 15 1519 38.9 % 2542 25.0 %
71 Spike v0.7 : 2461 19 17 1097 38.9 % 2540 25.0 %
72 Francesca M.0.0.9 : 2451 17 14 1617 37.3 % 2542 23.6 %
73 PostModernist v1.007 : 2441 18 14 1535 35.7 % 2543 24.4 %
74 Comet B60 : 2435 22 21 780 41.2 % 2497 25.6 %
75 Leila v0.53h : 2424 20 14 1493 34.0 % 2539 19.8 %
76 Tcb v0045 : 2414 20 13 1535 32.3 % 2543 22.2 %
77 Resp v0.19 : 2404 20 13 1520 31.1 % 2543 22.9 %
78 Arasan v7.4 : 2396 23 14 1220 29.8 % 2545 22.5 %
79 Nejmet v3.07 : 2390 25 18 876 33.2 % 2512 22.3 %
80 SlowChess v2.78 : 2379 27 19 790 33.5 % 2499 19.6 %
81 Exchess v4.03 : 2336 25 12 1519 23.2 % 2544 20.4 %
82 Beowulf v2.2 : 2308 29 12 1440 21.3 % 2535 16.9 %
Elostat 1.3: Error margins are now symmetric and sometimes a bit smaller, in particular also at the extremities of the list. The ratings themselves are identical to the ones calculated with 1.1b.
- Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 Chess Tiger 15.0 normal : 2721 18 18 1191 72.2 % 2555 28.0 %
2 Chess Tiger 2004 normal : 2714 21 21 911 73.1 % 2540 27.0 %
3 Gandalf v6.0WB : 2691 20 20 997 70.9 % 2536 21.9 %
4 Ruffian v2.1.0 : 2678 17 17 1358 68.0 % 2547 26.1 %
5 Ruffian v2.0.0 : 2675 20 20 900 70.1 % 2526 27.3 %
6 Ruffian v2.0.2dvbk : 2673 20 20 894 68.4 % 2539 27.4 %
7 List v5.12 : 2670 17 17 1196 66.4 % 2552 28.3 %
8 Pro Deo v1.0 11.Aug. : 2657 20 20 986 65.6 % 2545 22.3 %
9 Ruffian v1.0.1 : 2650 19 19 1015 67.7 % 2521 27.9 %
10 DeepSjeng v1.6ntb : 2620 16 16 1335 60.5 % 2546 26.4 %
11 Rebel v12.00.01 : 2619 20 20 908 59.2 % 2554 25.4 %
12 Aristarch v4.50 : 2617 17 16 1297 60.1 % 2545 25.9 %
13 Ktulu v5.1 : 2617 17 17 1206 60.7 % 2541 29.8 %
14 Thinker v4.6b : 2610 16 16 1230 59.7 % 2542 31.3 %
15 SmarThink v0.17a : 2607 16 16 1419 58.9 % 2545 26.4 %
16 Delfi v4.5 : 2598 18 17 1119 58.6 % 2537 27.7 %
17 Patriot v1.2.3cpbk : 2594 16 16 1297 56.9 % 2546 25.6 %
18 Ktulu v4.2 : 2587 18 18 1086 57.5 % 2535 24.7 %
19 Crafty v17.14DC : 2586 14 14 1575 56.7 % 2540 33.8 %
20 Thinker v4.5b : 2584 17 17 1055 56.4 % 2539 34.5 %
21 Ktulu v5.0 : 2583 20 20 943 56.5 % 2537 20.1 %
22 Aristarch v4.37 : 2582 18 18 959 56.7 % 2536 36.4 %
23 Crafty v19.06DCntb : 2582 17 17 1149 56.1 % 2539 30.9 %
24 Aristarch v4.21 : 2581 17 17 1177 57.2 % 2531 25.2 %
25 Yace v0.99.87 : 2581 18 18 1099 56.3 % 2537 25.9 %
26 Crafty v19.13RA : 2578 17 17 1201 55.1 % 2543 29.1 %
27 El Chinito v3.25 : 2568 18 18 1080 54.3 % 2538 27.8 %
28 Crafty-MPC v18.15DC : 2566 15 15 1516 53.6 % 2541 26.8 %
29 SlowChess Blitz v0.4 : 2565 20 20 857 55.8 % 2525 25.9 %
30 Wildcat v4.0 : 2565 17 17 1199 53.5 % 2540 27.1 %
31 Delfi v4.2 : 2562 24 24 580 58.1 % 2505 27.2 %
32 SoS 3 : 2562 15 15 1619 53.1 % 2540 24.7 %
33 Delfi v4.3 : 2561 17 17 1260 53.5 % 2537 24.9 %
34 SoS 4 : 2561 16 16 1392 52.4 % 2544 24.6 %
35 Gothmog v1.0 beta 10 : 2561 18 18 1099 53.3 % 2538 23.2 %
36 SmarThink v0.16b++ : 2560 21 21 836 58.0 % 2504 24.3 %
37 Little Goliath 2000 v3.9 : 2558 15 15 1618 52.6 % 2540 26.7 %
38 Crafty v18.15DC : 2557 21 21 741 59.0 % 2494 29.0 %
39 Pharaon v3.1 : 2555 19 19 999 52.3 % 2539 24.8 %
40 Yace Paderborn : 2554 15 15 1460 52.8 % 2535 26.9 %
41 Pepito v1.59 profile : 2553 15 15 1618 51.8 % 2540 23.8 %
42 SlowChess v2.94 : 2552 19 19 948 53.9 % 2525 25.4 %
43 Fruit v1.5 : 2551 18 18 1118 50.4 % 2549 25.2 %
44 Aristarch v4.4 : 2549 34 34 319 54.1 % 2520 20.4 %
45 Green Light Chess v3.0.3.4 : 2545 18 18 1095 51.0 % 2538 25.3 %
46 Anmon v5.51 : 2543 18 18 1058 50.9 % 2537 22.7 %
47 Yace v0.99.56 : 2543 30 30 380 54.3 % 2513 26.6 %
48 Little Goliath 2000 v3.5 : 2539 27 27 440 53.6 % 2513 30.9 %
49 Green Light Chess v3.00 : 2536 15 15 1479 50.0 % 2536 27.0 %
50 Glaurung v0.1.5 : 2532 19 19 898 48.2 % 2545 27.2 %
51 Baron v1.5.0 : 2530 19 19 971 48.7 % 2539 25.3 %
52 Anmon v5.30 : 2530 17 17 1259 48.6 % 2539 26.1 %
53 Tao v5.6 : 2523 16 16 1331 46.5 % 2547 24.3 %
54 Jonny v2.70 : 2522 20 20 977 46.8 % 2544 19.4 %
55 Pharaon v2.62 : 2510 16 16 1379 47.4 % 2529 24.2 %
56 Amyan v1.59 : 2510 16 16 1348 46.8 % 2532 26.7 %
57 Quark v2.35 : 2500 18 18 1092 44.5 % 2539 25.3 %
58 Crafty v19.01DC : 2500 21 21 815 50.4 % 2497 25.5 %
59 Gromit v3.8.2 : 2500 15 15 1589 44.1 % 2541 23.9 %
60 LambChop v10.99 : 2497 15 15 1617 43.7 % 2541 22.1 %
61 Ktulu v3.9 : 2492 21 21 779 48.6 % 2502 26.1 %
62 SlowChess v2.89b : 2488 17 17 1253 43.9 % 2530 25.0 %
63 Anmon v5.22 : 2485 19 20 899 46.4 % 2510 26.7 %
64 Comet B44-2 : 2479 15 15 1538 41.0 % 2542 25.0 %
65 SoS v11-99 : 2479 33 33 359 46.0 % 2507 17.3 %
66 Tao v5.4 : 2477 19 19 1039 44.1 % 2518 21.0 %
67 Amy v0.8.3 : 2476 16 16 1560 41.0 % 2539 18.2 %
68 KnightDreamer v3.2 : 2475 15 15 1520 40.5 % 2542 24.2 %
69 Dragon v4.4.3 : 2465 16 16 1466 39.2 % 2541 25.0 %
70 Comet B62-3 : 2464 15 15 1519 38.9 % 2542 25.0 %
71 Spike v0.7 : 2461 18 18 1097 38.9 % 2540 25.0 %
72 Francesca M.0.0.9 : 2451 15 15 1617 37.3 % 2542 23.6 %
73 PostModernist v1.007 : 2441 16 16 1535 35.7 % 2543 24.4 %
74 Comet B60 : 2435 21 21 780 41.2 % 2497 25.6 %
75 Leila v0.53h : 2424 16 16 1493 34.0 % 2539 19.8 %
76 Tcb v0045 : 2414 16 16 1535 32.3 % 2543 22.2 %
77 Resp v0.19 : 2404 16 16 1520 31.1 % 2543 22.9 %
78 Arasan v7.4 : 2396 18 18 1220 29.8 % 2545 22.5 %
79 Nejmet v3.07 : 2390 21 21 876 33.2 % 2512 22.3 %
80 SlowChess v2.78 : 2379 23 23 790 33.5 % 2499 19.6 %
81 Exchess v4.03 : 2336 17 18 1519 23.2 % 2544 20.4 %
82 Beowulf v2.2 : 2308 19 19 1440 21.3 % 2535 16.9 %
Bayeselo: The ratings themselves differ from Elostat: the difference between the ratings of the first and last engine is bigger; at various occasions even the order is different (e.g. with bayeselo Smarthink is ahead of Thinker, with Elostat Thinker is ahead). Error margins look pretty much comparable to the ones calculated by Elostat 1.3, but a little larger.
Btw: I have calibrated the bayeselo ratings to Lambchop = 2497 using Excel. Anybody knows how one could tell bayeselo to use a start ELO value?
- Code: Select all
Rank Name Elo + - games score ratio
1 Chess Tiger 15.0 normal 2743 20 19 1191 859.5 72%
2 Chess Tiger 2004 normal 2734 23 22 911 666 73%
3 Gandalf v6.0WB 2719 22 22 997 707 71%
4 Ruffian v2.1.0 2700 18 18 1358 924 68%
5 Ruffian v2.0.0 2695 23 22 900 631 70%
6 Ruffian v2.0.2dvbk 2694 22 22 894 611.5 68%
7 List v5.12 2691 19 19 1196 794 66%
8 Pro Deo v1.0 11.Aug. 2680 21 21 986 647 66%
9 Ruffian v1.0.1 2668 21 21 1015 687.5 68%
10 DeepSjeng v1.6ntb 2638 18 18 1335 808 61%
11 Rebel v12.00.01 2636 22 21 908 537.5 59%
12 Aristarch v4.50 2634 18 18 1297 780 60%
13 Ktulu v5.1 2633 19 18 1206 731.5 61%
14 SmarThink v0.17a 2626 17 17 1419 835.5 59%
15 Thinker v4.6b 2624 18 18 1230 734.5 60%
16 Patriot v1.2.3cpbk 2612 18 18 1297 738 57%
17 Delfi v4.5 2611 19 19 1119 656 59%
18 Ktulu v4.2 2604 20 20 1086 624 57%
19 Crafty v17.14DC 2600 16 16 1575 892.5 57%
20 Ktulu v5.0 2599 21 21 943 533 57%
21 Thinker v4.5b 2596 19 19 1055 595 56%
22 Aristarch v4.21 2596 19 19 1177 673.5 57%
23 Crafty v19.06DCntb 2594 19 19 1149 644.5 56%
24 Yace v0.99.87 2593 20 19 1099 618.5 56%
25 Aristarch v4.37 2593 20 20 959 543.5 57%
26 Crafty v19.13RA 2592 18 18 1201 661.5 55%
27 El Chinito v3.25 2581 20 20 1080 586 54%
28 Crafty-MPC v18.15DC 2579 17 17 1516 812.5 54%
29 SlowChess Blitz v0.4 2576 22 22 857 478 56%
30 Wildcat v4.0 2576 19 18 1199 641.5 54%
31 Gothmog v1.0 beta 10 2574 20 20 1099 585.5 53%
32 SoS 4 2573 17 17 1392 730 52%
33 SoS 3 2573 16 16 1619 859 53%
34 Delfi v4.3 2573 18 18 1260 674 53%
35 Delfi v4.2 2573 27 26 580 337 58%
36 SmarThink v0.16b++ 2572 23 22 836 484.5 58%
37 Little Goliath 2000 v3.9 2569 16 16 1618 851 53%
38 Pharaon v3.1 2567 20 21 999 522 52%
39 Crafty v18.15DC 2566 23 23 741 437.5 59%
40 Yace Paderborn 2565 17 17 1460 771.5 53%
41 SlowChess v2.94 2564 21 21 948 510.5 54%
42 Pepito v1.59 profile 2563 16 16 1618 838.5 52%
43 Fruit v1.5 2561 19 19 1118 563 50%
44 Aristarch v4.4 2558 36 36 319 172.5 54%
45 Yace v0.99.56 2556 33 33 380 206.5 54%
46 Green Light Chess v3.0.3.4 2554 20 20 1095 558.5 51%
47 Anmon v5.51 2552 20 20 1058 538 51%
48 Little Goliath 2000 v3.5 2549 30 30 440 236 54%
49 Green Light Chess v3.00 2544 17 17 1479 740 50%
50 Glaurung v0.1.5 2539 21 21 898 433 48%
51 Baron v1.5.0 2538 21 21 971 473 49%
52 Anmon v5.30 2537 18 18 1259 612.5 49%
53 Tao v5.6 2531 18 18 1331 619 47%
54 Jonny v2.70 2528 21 21 977 457 47%
55 Amyan v1.59 2515 17 18 1348 631 47%
56 Pharaon v2.62 2515 17 17 1379 653 47%
57 Crafty v19.01DC 2506 22 22 815 411 50%
58 Quark v2.35 2504 20 20 1092 486 45%
59 Gromit v3.8.2 2502 16 16 1589 700 44%
60 LambChop v10.99 2497 16 17 1617 706 44%
61 Ktulu v3.9 2494 23 23 779 378.5 49%
62 SlowChess v2.89b 2488 18 18 1253 550.5 44%
63 Anmon v5.22 2487 21 21 899 417 46%
64 Comet B44-2 2479 17 17 1538 631 41%
65 SoS v11-99 2478 35 35 359 165 46%
66 Tao v5.4 2478 20 20 1039 458 44%
67 KnightDreamer v3.2 2473 17 17 1520 616 41%
68 Amy v0.8.3 2472 17 17 1560 639 41%
69 Comet B62-3 2463 17 17 1519 591.5 39%
70 Dragon v4.4.3 2462 17 17 1466 574.5 39%
71 Spike v0.7 2459 20 20 1097 427 39%
72 Francesca M.0.0.9 2446 17 17 1617 602.5 37%
73 PostModernist v1.007 2436 17 17 1535 548.5 36%
74 Comet B60 2433 23 23 780 321 41%
75 Leila v0.53h 2415 18 18 1493 508 34%
76 Tcb v0045 2405 17 18 1535 495.5 32%
77 Resp v0.19 2396 17 18 1520 472 31%
78 Arasan v7.4 2387 19 20 1220 363 30%
79 Nejmet v3.07 2379 23 23 876 290.5 33%
80 SlowChess v2.78 2364 24 24 790 264.5 33%
81 Exchess v4.03 2323 18 19 1519 352 23%
82 Beowulf v2.2 2288 20 20 1440 307 21%
Robert