YABRL: 50 points improvement by Tao, but ...

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

YABRL: 50 points improvement by Tao, but ...

Postby Robert Allgeuer » 17 Apr 2004, 23:12

Geschrieben von:/Posted by: Robert Allgeuer at 18 April 2004 00:12:44:

After 760 games Tao 5.6 scored 50 points higher than its predecessor, which is a statistically significant improvement.
However, it still does not support underpromotions and this turns out really annoying, because in such cases Tao
- may claim an incorrect stalemate
- may claim even an incorrect win
- may make incorrect moves
The latter is easy to find, but the other two are really annoying: no other engine needs that much checking of game output and adjudicating as Tao.
It is not negligable: in the 760 games of Tao 5.6 there were 6 underpromotions to a knight and 3 underpromotions to a rook, altogether 9, which means they occurred in more than 1% of the games.
It is a shame that an engine at that level does not implement all rules of chess.
For tools, conditions, time control etc. please refer to the link below. Next engine will be the last remaining still untested (by me) candidate for being the strongest free engine: List 5.12.



    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
 01 Ruffian v2.1.0            : 2679   18  28   778    71.6 %   2518   24.9 %
 02 Ruffian v2.0.0            : 2675   17  27   840    71.6 %   2515   25.8 %
 03 Ruffian v1.0.1            : 2652   17  24   936    69.7 %   2508   26.7 %
 04 DeepSjeng v1.6ntb         : 2626   20  25   759    65.3 %   2516   24.5 %
 05 SmarThink v0.17a          : 2590   20  22   839    60.3 %   2518   25.5 %
 06 Thinker v4.5b             : 2590   21  20   796    60.3 %   2518   33.2 %
 07 Crafty v17.14DC           : 2589   18  18  1018    60.6 %   2515   32.7 %
 08 Ktulu v4.2                : 2588   20  22   835    60.5 %   2514   24.2 %
 09 Crafty v19.06DCntb        : 2582   20  19   881    59.1 %   2518   30.5 %
 10 Aristarch v4.21           : 2579   18  19  1037    59.2 %   2515   24.6 %
 11 Aristarch v4.37           : 2574   21  18   819    58.1 %   2517   36.6 %
 12 Delfi v4.3                : 2566   20  19   960    57.4 %   2514   25.3 %
 13 Crafty-MPC v18.15DC       : 2564   20  19   944    57.1 %   2514   27.3 %
 14 Delfi v4.2                : 2561   25  25   580    58.1 %   2505   27.2 %
 15 El Chinito v3.25          : 2559   22  20   800    56.5 %   2514   27.5 %
 16 SmarThink v0.16b++        : 2559   21  21   836    58.0 %   2503   24.3 %
 17 Crafty v18.15DC           : 2557   22  22   741    59.0 %   2494   29.0 %
 18 Little Goliath 2000 v3.9  : 2556   19  18  1040    55.9 %   2515   26.2 %
 19 SoS 3                     : 2555   20  19  1039    55.7 %   2516   22.6 %
 20 Yace Paderborn            : 2553   20  18  1020    55.7 %   2513   26.0 %
 21 Pepito v1.59 profile      : 2553   20  18  1040    55.4 %   2516   25.4 %
 22 Aristarch v4.4            : 2548   36  34   319    54.1 %   2519   20.4 %
 23 Yace v0.99.56             : 2542   33  29   380    54.3 %   2512   26.6 %
 24 SoS 4                     : 2541   23  20   819    53.7 %   2515   24.3 %
 25 Green Light Chess v3.00   : 2541   20  17  1040    53.6 %   2516   25.8 %
 26 Little Goliath 2000 v3.5  : 2537   31  25   440    53.6 %   2512   30.9 %
 27 Tao v5.6                  : 2526   25  20   757    51.1 %   2518   24.3 %
 28 Anmon v5.30               : 2520   24  19   800    50.7 %   2515   26.9 %
 29 Amyan v1.59               : 2513   18  23   931    49.8 %   2515   26.0 %
 30 Pharaon v2.62             : 2509   17  21  1039    48.9 %   2516   24.4 %
 31 Crafty v19.01DC           : 2500   24  19   815    50.4 %   2497   25.5 %
 32 LambChop v10.99           : 2496   18  21  1037    47.1 %   2516   22.7 %
 33 Ktulu v3.9                : 2492   19  24   779    48.6 %   2501   26.1 %
 34 Gromit v3.8.2             : 2491   18  20  1017    46.4 %   2516   23.2 %
 35 SlowChess v2.89b          : 2489   19  21   917    46.0 %   2517   23.9 %
 36 KnightDreamer v3.2        : 2486   19  21   940    45.9 %   2515   25.0 %
 37 Anmon v5.22               : 2483   19  22   899    46.4 %   2508   26.7 %
 38 Comet B44-2               : 2480   18  20   960    44.8 %   2516   26.9 %
 39 Amy v0.8.3                : 2477   20  20  1029    44.4 %   2516   18.9 %
 40 SoS v11-99                : 2477   33  34   359    46.0 %   2505   17.3 %
 41 Tao v5.4                  : 2476   19  19  1039    44.1 %   2517   21.0 %
 42 Dragon v4.4.3             : 2471   19  21   905    43.9 %   2514   25.6 %
 43 Comet B62-3               : 2458   20  20   940    41.8 %   2516   25.6 %
 44 PostModernist v1.007      : 2442   21  19   960    39.3 %   2517   24.9 %
 45 Francesca M.0.0.9         : 2441   20  18  1039    39.2 %   2518   25.0 %
 46 Comet B60                 : 2435   22  21   780    41.2 %   2497   25.6 %
 47 Leila v0.53h              : 2427   22  18   958    37.3 %   2517   21.4 %
 48 Tcb v0045                 : 2422   22  18   959    36.6 %   2517   24.4 %
 49 Resp v0.19                : 2403   23  18   940    34.1 %   2517   23.6 %
 50 Nejmet v3.07              : 2389   25  18   876    33.2 %   2511   22.3 %
 51 SlowChess v2.78           : 2379   27  19   790    33.5 %   2498   19.6 %
 52 Exchess v4.03             : 2329   30  15   939    25.1 %   2518   21.5 %
 53 Beowulf v2.2              : 2304   33  14  1020    22.4 %   2520   17.5 %

Games        :  22980 (finished)
White Wins   :   9329 (40.6 %)
Black Wins   :   7875 (34.3 %)
Draws        :   5776 (25.1 %)
Unfinished   :      0
White Perf.  : 53.2 %
Black Perf.  : 46.8 %

(27) Tao v5.6                  : 757 (+295,=184,-278), 51.1 %
Beowulf v2.2                  :  20 (+ 18,=  1,-  1), 92.5 %
Tcb v0045                     :  20 (+ 12,=  2,-  6), 65.0 %
Resp v0.19                    :  20 (+ 17,=  1,-  2), 87.5 %
Exchess v4.03                 :  20 (+ 16,=  0,-  4), 80.0 %
Ruffian v2.1.0                :  20 (+  2,=  6,- 12), 25.0 %
DeepSjeng v1.6ntb             :  20 (+  7,=  5,-  8), 47.5 %
SmarThink v0.17a              :  20 (+  6,=  5,-  9), 42.5 %
Thinker v4.5b                 :  20 (+  3,= 10,-  7), 40.0 %
Crafty v17.14DC               :  20 (+  5,=  7,-  8), 42.5 %
Ktulu v4.2                    :  20 (+  4,=  6,- 10), 35.0 %
Crafty v19.06DCntb            :  20 (+  4,=  6,- 10), 35.0 %
Aristarch v4.21               :  20 (+  7,=  6,-  7), 50.0 %
Aristarch v4.37               :  20 (+  6,=  6,-  8), 45.0 %
Delfi v4.3                    :  20 (+  4,=  7,-  9), 37.5 %
El Chinito v3.25              :  20 (+  9,=  3,-  8), 52.5 %
Crafty-MPC v18.15DC           :  20 (+  8,=  6,-  6), 55.0 %
Little Goliath 2000 v3.9      :  20 (+  6,=  2,- 12), 35.0 %
SoS 3                         :  20 (+  6,=  4,- 10), 40.0 %
Pepito v1.59 profile          :  20 (+  7,=  5,-  8), 47.5 %
Yace Paderborn                :  20 (+  6,=  3,- 11), 37.5 %
SoS 4                         :  20 (+ 10,=  5,-  5), 62.5 %
Green Light Chess v3.00       :  20 (+  2,= 11,-  7), 37.5 %
Anmon v5.30                   :  20 (+  8,=  5,-  7), 52.5 %
Amyan v1.59                   :  20 (+  6,=  7,-  7), 47.5 %
Pharaon v2.62                 :  20 (+  4,=  9,-  7), 42.5 %
LambChop v10.99               :  19 (+  8,=  3,-  8), 50.0 %
Gromit v3.8.2                 :  20 (+  8,=  5,-  7), 52.5 %
SlowChess v2.89b              :  19 (+  5,=  6,-  8), 42.1 %
KnightDreamer v3.2            :  20 (+  4,=  6,- 10), 35.0 %
Comet B44-2                   :  20 (+ 10,=  3,-  7), 57.5 %
Amy v0.8.3                    :  20 (+ 11,=  2,-  7), 60.0 %
Dragon v4.4.3                 :  19 (+  6,=  6,-  7), 47.4 %
Comet B62-3                   :  20 (+ 13,=  2,-  5), 70.0 %
Francesca M.0.0.9             :  20 (+ 14,=  2,-  4), 75.0 %
PostModernist v1.007          :  20 (+  9,=  5,-  6), 57.5 %
Leila v0.53h                  :  20 (+ 10,=  4,-  6), 60.0 %
Tao v5.4                      :  20 (+ 12,=  6,-  2), 75.0 %
Ruffian v1.0.1                :  20 (+  2,=  6,- 12), 25.0 %





YABRL (Yet Another Blitz Rating List)
Robert Allgeuer
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Norm Pollock » 18 Apr 2004, 04:50

Geschrieben von:/Posted by: Norm Pollock at 18 April 2004 05:50:51:
Als Antwort auf:/In reply to: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Robert Allgeuer at 18 April 2004 00:12:44:

How do you check for the underpromotion errors? Do you use a utility?
Norm Pollock
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Robert Allgeuer » 18 Apr 2004, 08:29

Geschrieben von:/Posted by: Robert Allgeuer at 18 April 2004 09:29:51:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Norm Pollock at 18 April 2004 05:50:51:
How do you check for the underpromotion errors? Do you use a utility?
If there were a utility it would be a bit less annoying.
What I do:
1) I run pgn-extract over thepgn collection to remove duplicates, during this process I get as a by-product also all games with illegal moves in it, because pgn-extract does not match such games. This catches the third possibility.
2) For the first two possibilities I simply search for the strings =N, =R and =B in the pgn file and look at / analyse those games. Particularly suspect are games where immediately or only a few moves after the underpromotion a stalemate is claimed or a win for the side that was not underpromoting.
So sorry, it is more the legwork approach, rather than having a clever tool.
Robert



YABRL (Yet Another Blitz Rating List)
Robert Allgeuer
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Robert Allgeuer » 18 Apr 2004, 08:55

Geschrieben von:/Posted by: Robert Allgeuer at 18 April 2004 09:55:25:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Robert Allgeuer at 18 April 2004 09:29:51:

forgot: lgpgnver from George Lyapko is another tool that I always use. This identifies incorrect draw claims, including those incorrect stalemates after an underpromotion problem.
But as said I found that even the combination of pgn-extract and lgpgnver is not sufficient, because they do not catch incorrect win claims. A manual search and inspection is still necessary.
Robert
How do you check for the underpromotion errors? Do you use a utility?
If there were a utility it would be a bit less annoying.
What I do:
1) I run pgn-extract over thepgn collection to remove duplicates, during this process I get as a by-product also all games with illegal moves in it, because pgn-extract does not match such games. This catches the third possibility.
2) For the first two possibilities I simply search for the strings =N, =R and =B in the pgn file and look at / analyse those games. Particularly suspect are games where immediately or only a few moves after the underpromotion a stalemate is claimed or a win for the side that was not underpromoting.
So sorry, it is more the legwork approach, rather than having a clever tool.
Robert



YABRL (Yet Another Blitz Rating List)
Robert Allgeuer
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Bryan Hofmann » 18 Apr 2004, 13:21

Geschrieben von:/Posted by: Bryan Hofmann at 18 April 2004 14:21:19:
Als Antwort auf:/In reply to: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Robert Allgeuer at 18 April 2004 00:12:44:
After 760 games Tao 5.6 scored 50 points higher than its predecessor, which is a statistically significant improvement.
However, it still does not support underpromotions and this turns out really annoying, because in such cases Tao
- may claim an incorrect stalemate
- may claim even an incorrect win
- may make incorrect moves
The latter is easy to find, but the other two are really annoying: no other engine needs that much checking of game output and adjudicating as Tao.
It is not negligable: in the 760 games of Tao 5.6 there were 6 underpromotions to a knight and 3 underpromotions to a rook, altogether 9, which means they occurred in more than 1% of the games.
It is a shame that an engine at that level does not implement all rules of chess.
For tools, conditions, time control etc. please refer to the link below. Next engine will be the last remaining still untested (by me) candidate for being the strongest free engine: List 5.12.



    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
01 Ruffian v2.1.0            : 2679   18  28   778    71.6 %   2518   24.9 %
02 Ruffian v2.0.0            : 2675   17  27   840    71.6 %   2515   25.8 %
03 Ruffian v1.0.1            : 2652   17  24   936    69.7 %   2508   26.7 %
It is negligible in that it occurs in only .01% of your games.

The author is aware of this issue but has a low priority due to it's insignificance and the main priority being book learning.

Just as a FYI you state in your conditions that Ruffian 1.0.1 is fixed to 2650 as a reference point and the below shows this not to be the case.
Bryan Hofmann
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Bryan Hofmann » 18 Apr 2004, 14:59

Geschrieben von:/Posted by: Bryan Hofmann at 18 April 2004 15:59:13:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Bryan Hofmann at 18 April 2004 14:21:19:
After 760 games Tao 5.6 scored 50 points higher than its predecessor, which is a statistically significant improvement.
However, it still does not support underpromotions and this turns out really annoying, because in such cases Tao
- may claim an incorrect stalemate
- may claim even an incorrect win
- may make incorrect moves
The latter is easy to find, but the other two are really annoying: no other engine needs that much checking of game output and adjudicating as Tao.
It is not negligable: in the 760 games of Tao 5.6 there were 6 underpromotions to a knight and 3 underpromotions to a rook, altogether 9, which means they occurred in more than 1% of the games.
It is a shame that an engine at that level does not implement all rules of chess.
For tools, conditions, time control etc. please refer to the link below. Next engine will be the last remaining still untested (by me) candidate for being the strongest free engine: List 5.12.



    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
01 Ruffian v2.1.0            : 2679   18  28   778    71.6 %   2518   24.9 %
02 Ruffian v2.0.0            : 2675   17  27   840    71.6 %   2515   25.8 %
03 Ruffian v1.0.1            : 2652   17  24   936    69.7 %   2508   26.7 %
It is negligible in that it occurs in only .01% of your games.

The author is aware of this issue but has a low priority due to it's insignificance and the main priority being book learning.

Just as a FYI you state in your conditions that Ruffian 1.0.1 is fixed to 2650 as a reference point and the below shows this not to be the case.
Too early in the morning and had not finished my coffee it is 1%
Bryan Hofmann
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Robert Allgeuer » 18 Apr 2004, 17:06

Geschrieben von:/Posted by: Robert Allgeuer at 18 April 2004 18:06:20:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Bryan Hofmann at 18 April 2004 15:59:13:
The author is aware of this issue but has a low priority due to it's insignificance and the main priority being book learning.
For tools, conditions, time control etc. please refer to the link below. Next engine will be the last remaining still untested (by me) candidate for being the strongest free engine: List 5.12.



    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
01 Ruffian v2.1.0            : 2679   18  28   778    71.6 %   2518   24.9 %
02 Ruffian v2.0.0            : 2675   17  27   840    71.6 %   2515   25.8 %
03 Ruffian v1.0.1            : 2652   17  24   936    69.7 %   2508   26.7 %
Just as a FYI you state in your conditions that Ruffian 1.0.1 is fixed to 2650 as a reference point and the below shows this not to be the case.

I am not sure whether it is negligable when the problem has 1% probability, causes undetermined behaviour and essentially means that Tao does not support all rules of chess. Losing a wheel 3 times a year is also not negligable for a car.

True, I have posted some time later that I have changed the reference point, it is now Pharaon 2.62 with 2509 (2509 was chosen to maintain continuity, Pharaon since then has fallen by 2 with comparison to Ruffian 1.0.1). The reason was that Ruffian 1.0.1 is now an engine superseded by several newer versions, while Pharaon has stayed and I guess will stay with us for a while without version change.
I should probably repost the full conditions and reference this new post from then on. But anyway the absolute reference point is generally arbitrary.
Robert



YABRL (Yet Another Blitz Rating List)
Robert Allgeuer
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Bryan Hofmann » 18 Apr 2004, 18:00

Geschrieben von:/Posted by: Bryan Hofmann at 18 April 2004 19:00:39:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Robert Allgeuer at 18 April 2004 18:06:20:
The author is aware of this issue but has a low priority due to it's insignificance and the main priority being book learning.
For tools, conditions, time control etc. please refer to the link below. Next engine will be the last remaining still untested (by me) candidate for being the strongest free engine: List 5.12.



    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
01 Ruffian v2.1.0            : 2679   18  28   778    71.6 %   2518   24.9 %
02 Ruffian v2.0.0            : 2675   17  27   840    71.6 %   2515   25.8 %
03 Ruffian v1.0.1            : 2652   17  24   936    69.7 %   2508   26.7 %
Just as a FYI you state in your conditions that Ruffian 1.0.1 is fixed to 2650 as a reference point and the below shows this not to be the case.

I am not sure whether it is negligable when the problem has 1% probability, causes undetermined behaviour and essentially means that Tao does not support all rules of chess. Losing a wheel 3 times a year is also not negligable for a car.

True, I have posted some time later that I have changed the reference point, it is now Pharaon 2.62 with 2509 (2509 was chosen to maintain continuity, Pharaon since then has fallen by 2 with comparison to Ruffian 1.0.1). The reason was that Ruffian 1.0.1 is now an engine superseded by several newer versions, while Pharaon has stayed and I guess will stay with us for a while without version change.
I should probably repost the full conditions and reference this new post from then on. But anyway the absolute reference point is generally arbitrary.
Robert
I hardly think that comparing a amateur chess engine fault to a wheel falling off a car any type of a comparison. Would you say that a chess engine that looses 1% of the time not negligible when looking at a win lose ratio?
Bryan Hofmann
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Uri Blass » 18 Apr 2004, 19:45

Geschrieben von:/Posted by: Uri Blass at 18 April 2004 20:45:41:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Bryan Hofmann at 18 April 2004 19:00:39:
The author is aware of this issue but has a low priority due to it's insignificance and the main priority being book learning.

I am not sure whether it is negligable when the problem has 1% probability, causes undetermined behaviour and essentially means that Tao does not support all rules of chess. Losing a wheel 3 times a year is also not negligable for a car.
I hardly think that comparing a amateur chess engine fault to a wheel falling off a car any type of a comparison. Would you say that a chess engine that looses 1% of the time not negligible when looking at a win lose ratio?

Movei is losing less than 1% of its games on time with ponder on.
I think that the problem is not negligible and I consider to release a new version.
I already have a new version without that problem but hopefully I will do some improvements.
there is a difference between mistakes.
Mistakes that cause the program to lose the game by chess errors are different than mistakes that cause the program to lose on time or mistakes that cause the program not to understand the rules.
bugs that cause the program to play a stupid tactical mistake in one out of 100 games is also more serious than lack of endgame knowledge that cause the program to miss a draw in one out of 100 games.
Uri
Uri Blass
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Robert Allgeuer » 18 Apr 2004, 19:46

Geschrieben von:/Posted by: Robert Allgeuer at 18 April 2004 20:46:40:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Bryan Hofmann at 18 April 2004 19:00:39:

I am not sure whether it is negligable when the problem has 1% probability, causes undetermined behaviour and essentially means that Tao does not support all rules of chess. Losing a wheel 3 times a year is also not negligable for a car.
I hardly think that comparing a amateur chess engine fault to a wheel falling off a car any type of a comparison. Would you say that a chess engine that looses 1% of the time not negligible when looking at a win lose ratio?
Such an engine just does not work according to the specs of the game.
Robert


YABRL (Yet Another Blitz Rating List)
Robert Allgeuer
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Robert Allgeuer » 18 Apr 2004, 19:52

Geschrieben von:/Posted by: Robert Allgeuer at 18 April 2004 20:52:20:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Uri Blass at 18 April 2004 20:45:41:
The author is aware of this issue but has a low priority due to it's insignificance and the main priority being book learning.

I am not sure whether it is negligable when the problem has 1% probability, causes undetermined behaviour and essentially means that Tao does not support all rules of chess. Losing a wheel 3 times a year is also not negligable for a car.
I hardly think that comparing a amateur chess engine fault to a wheel falling off a car any type of a comparison. Would you say that a chess engine that looses 1% of the time not negligible when looking at a win lose ratio?

Movei is losing less than 1% of its games on time with ponder on.
I think that the problem is not negligible and I consider to release a new version.
I already have a new version without that problem but hopefully I will do some improvements.
there is a difference between mistakes.
Mistakes that cause the program to lose the game by chess errors are different than mistakes that cause the program to lose on time or mistakes that cause the program not to understand the rules.
bugs that cause the program to play a stupid tactical mistake in one out of 100 games is also more serious than lack of endgame knowledge that cause the program to miss a draw in one out of 100 games.
Uri
in the Tao case it is not only a case of losing or making mistakes, it incorrectly claims draws when it is in fact in a lost position and I also saw it claim wins when the game is pretty much even. If it were just losing ok, but this is just incorrect behaviour
Robert


YABRL (Yet Another Blitz Rating List)
Robert Allgeuer
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Bryan Hofmann » 18 Apr 2004, 20:16

Geschrieben von:/Posted by: Bryan Hofmann at 18 April 2004 21:16:42:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Uri Blass at 18 April 2004 20:45:41:
The author is aware of this issue but has a low priority due to it's insignificance and the main priority being book learning.

I am not sure whether it is negligable when the problem has 1% probability, causes undetermined behaviour and essentially means that Tao does not support all rules of chess. Losing a wheel 3 times a year is also not negligable for a car.
I hardly think that comparing a amateur chess engine fault to a wheel falling off a car any type of a comparison. Would you say that a chess engine that looses 1% of the time not negligible when looking at a win lose ratio?

Movei is losing less than 1% of its games on time with ponder on.
I think that the problem is not negligible and I consider to release a new version.
I already have a new version without that problem but hopefully I will do some improvements.
there is a difference between mistakes.
Mistakes that cause the program to lose the game by chess errors are different than mistakes that cause the program to lose on time or mistakes that cause the program not to understand the rules.
bugs that cause the program to play a stupid tactical mistake in one out of 100 games is also more serious than lack of endgame knowledge that cause the program to miss a draw in one out of 100 games.
Uri
It is not a mistake nor a bug, we are talking about it is the lack of knowledge in Tao of how to handle a pawn promotion by the oponent to anything other than a queen. My point was to put things into perspective and that this is NOT significate.
Bryan Hofmann
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Bryan Hofmann » 18 Apr 2004, 20:22

Geschrieben von:/Posted by: Bryan Hofmann at 18 April 2004 21:22:16:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Robert Allgeuer at 18 April 2004 20:46:40:
I am not sure whether it is negligable when the problem has 1% probability, causes undetermined behaviour and essentially means that Tao does not support all rules of chess. Losing a wheel 3 times a year is also not negligable for a car.
I hardly think that comparing a amateur chess engine fault to a wheel falling off a car any type of a comparison. Would you say that a chess engine that looses 1% of the time not negligible when looking at a win lose ratio?
Such an engine just does not work according to the specs of the game.
Robert
I grant you that it does not understand this one aspect of the game. Let me ask you this, you state that you are testing for the strongest freeware blitz engine. In the case of chess strength is knowledge. How do you attain knowledge --- though learning. Yet you have disabled learning aspects of the engines your are matching. Is this not negligible?
Bryan Hofmann
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Robert Allgeuer » 18 Apr 2004, 21:54

Geschrieben von:/Posted by: Robert Allgeuer at 18 April 2004 22:54:08:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Bryan Hofmann at 18 April 2004 21:22:16:
I am not sure whether it is negligable when the problem has 1% probability, causes undetermined behaviour and essentially means that Tao does not support all rules of chess. Losing a wheel 3 times a year is also not negligable for a car.
I hardly think that comparing a amateur chess engine fault to a wheel falling off a car any type of a comparison. Would you say that a chess engine that looses 1% of the time not negligible when looking at a win lose ratio?
Such an engine just does not work according to the specs of the game.
Robert
I grant you that it does not understand this one aspect of the game. Let me ask you this, you state that you are testing for the strongest freeware blitz engine. In the case of chess strength is knowledge. How do you attain knowledge --- though learning. Yet you have disabled learning aspects of the engines your are matching. Is this not negligible?

Such a rating list or tournament is nothing else than a measurement. And it depends what you want to measure; the conditions must then be set accordingly. I do not want to measure how an engine can adapt to the style of another engine through learning, but how an engine would perform against another opponent, if it played without history against this opponent. In order to achieve this, learning must be off, so that each game of a given engine against an opponent is run under identical conditions, independent whether it is the first or the twentieth game in a series.
A side effect of leaving learning on is that this distorts ratings, because not all engines support learning. Testing with learning does also mean that strictly speaking each game against a given opponent is run under different conditions, and results principally depend on the number of games you play.
If I wanted to test learning functions I would run series of games of the same engine(s), once with and once without learning and compare results.
Robert



YABRL (Yet Another Blitz Rating List)
Robert Allgeuer
 

Re: YABRL: 50 points improvement by Tao, but ...

Postby Bryan Hofmann » 20 Apr 2004, 11:29

Geschrieben von:/Posted by: Bryan Hofmann at 20 April 2004 12:29:01:
Als Antwort auf:/In reply to: Re: YABRL: 50 points improvement by Tao, but ... geschrieben von:/posted by: Robert Allgeuer at 18 April 2004 22:54:08:
I am not sure whether it is negligable when the problem has 1% probability, causes undetermined behaviour and essentially means that Tao does not support all rules of chess. Losing a wheel 3 times a year is also not negligable for a car.
I hardly think that comparing a amateur chess engine fault to a wheel falling off a car any type of a comparison. Would you say that a chess engine that looses 1% of the time not negligible when looking at a win lose ratio?
Such an engine just does not work according to the specs of the game.
Robert
I grant you that it does not understand this one aspect of the game. Let me ask you this, you state that you are testing for the strongest freeware blitz engine. In the case of chess strength is knowledge. How do you attain knowledge --- though learning. Yet you have disabled learning aspects of the engines your are matching. Is this not negligible?

Such a rating list or tournament is nothing else than a measurement. And it >depends what you want to measure; the conditions must then be set accordingly. >I do not want to measure how an engine can adapt to the style of another engine >through learning, but how an engine would perform against another opponent, if >it played without history against this opponent. In order to achieve this, >learning must be off, so that each game of a given engine against an opponent >is run under identical conditions, independent whether it is the first or the >twentieth game in a series.
A side effect of leaving learning on is that this distorts ratings, because not >all engines support learning. Testing with learning does also mean that >strictly speaking each game against a given opponent is run under different >conditions, and results principally depend on the number of games you play.
If I wanted to test learning functions I would run series of games of the same >engine(s), once with and once without learning and compare results.
Robert

The learning function does not lead to and engine adapting to a style of play. It learns which openings in its book that are not good lines to play and positions which are not favorable. This has nothing to do with style of play, a opening which an engine scores as bad and loses will continue to play the same game the same way with the same results given the opponent is of equal strength.

Wrong, there is no distortion of any scores, it simply shows the true strength of a engine. By disabling learning in the engines you are merely handicapping to compensate for the engine that do not have this feature.
Bryan Hofmann
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 29 guests