WB_F result bug

Discussions about the WinBoard protocol. Here you can also report bugs and request new features.

Moderators: hgm, Andres Valverde

WB_F result bug

Postby Guenther Simon » 16 May 2009, 21:31

Hello HG,

IMO the described problem with Numpty in the thread linked below also shows an unwanted behaviour
in WB_F, respectively a bug in the given result.

http://www.open-aurec.com/wbforum/viewtopic.php?f=2&t=50132

Guenther
User avatar
Guenther Simon
 
Posts: 794
Joined: 26 Sep 2004, 19:49
Location: Regensburg, Germany

Re: WB_F result bug

Postby H.G.Muller » 17 May 2009, 07:29

Well, we should discuss this. From the fact that this message exists it should be obvious that WB exactly knows what is going on, and that ruling a draw in this situation was the intended behavior.

My reasoning was this: There are several weak engines that fail to print RESULT commands, and just exit when they wantto end the gae for reasons of checkmate, repetition or 50-move draw. Forfeiting engines only for non-compliancy seems a bit harsh. When I am testing an engine I am developing against such a non-compliant engine, the validity of the testing is very much disrupted by such forfeits.

So my policy is to forfeit engines only in case of absolute necessity. If they send a false RESULT or illegal-move claim, I can do little else then forfeit them, as most engines stop after such a claim, and it would be a waste of time to wait them out. (Not to speak of the disaster when autoCallFlag is off...). But if the problem is merely that they omit the RESULT claim, and WinBoard can deduce the result, it uses that result, rather than forfeiting them. This is in general much less disruptive for testing when the score for the opponent matters; handing free points to an engine that should have lost or drawn on its own merits has a very distorting effect on rating lists. It would basically make the non-compliant engine useless for testing, and stable engines in the Elo range where this is common are very hard to find already, and usually completely deterministic, and often not able to handle an external book, so you need many.

Usually it is the quality of the Chess the engine (and its opponent!) produces, not its compliancy with protocol, which one is interested in. The situation is flagged by a unique REASON message, and those mainly interested in compliancy can always scan the PGN for this message, and thake the appropriate action. The default result now at least does not contaminate the result of the opponent.

Note that the fact that you instruct WB to adjudicate after 6 repetitions does in no way void the right of the engine to claim after 3 reptitions.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: WB_F result bug

Postby Guenther Simon » 17 May 2009, 09:29

H.G.Muller wrote:Well, we should discuss this. From the fact that this message exists it should be obvious that WB exactly knows what is going on, and that ruling a draw in this situation was the intended behavior.

My reasoning was this: There are several weak engines that fail to print RESULT commands, and just exit when they wantto end the gae for reasons of checkmate, repetition or 50-move draw. Forfeiting engines only for non-compliancy seems a bit harsh. When I am testing an engine I am developing against such a non-compliant engine, the validity of the testing is very much disrupted by such forfeits.

So my policy is to forfeit engines only in case of absolute necessity. If they send a false RESULT or illegal-move claim, I can do little else then forfeit them, as most engines stop after such a claim, and it would be a waste of time to wait them out. (Not to speak of the disaster when autoCallFlag is off...). But if the problem is merely that they omit the RESULT claim, and WinBoard can deduce the result, it uses that result, rather than forfeiting them. This is in general much less disruptive for testing when the score for the opponent matters; handing free points to an engine that should have lost or drawn on its own merits has a very distorting effect on rating lists. It would basically make the non-compliant engine useless for testing, and stable engines in the Elo range where this is common are very hard to find already, and usually completely deterministic, and often not able to handle an external book, so you need many.

Usually it is the quality of the Chess the engine (and its opponent!) produces, not its compliancy with protocol, which one is interested in. The situation is flagged by a unique REASON message, and those mainly interested in compliancy can always scan the PGN for this message, and thake the appropriate action. The default result now at least does not contaminate the result of the opponent.

Note that the fact that you instruct WB to adjudicate after 6 repetitions does in no way void the right of the engine to claim after 3 reptitions.


Sorry, but I know not a single engine, which simply exits instead of claiming a draw. I am sure Chris will fix that bug and it is not really
a WB task to 'help' it.(It is also not a GUI task e.g. to give a program more time it has on the clock, as Arena sometimes does by allowing
evenb negative time until the cut is reached... You also don't know if a program just omits the result message for whatever reason and exits,
or really crashes for whatever reason) It is also not about an old non-compliant engine, but about a new version of a still
developing program, which has a predecessor version, which does _not_ exit in the same situation.

The last paragraph BTW was completely invain, because you did not read my message correctly and it is wellknown by me what it does and what not,therefore I had attached a sentence in brackets which you seemed to have parsed away"(in case none of both programs is able to claim a rep draw)".
(It would be absurd anyway to try, if possible at all, to void the right for 3 time rep...)

Guenther
User avatar
Guenther Simon
 
Posts: 794
Joined: 26 Sep 2004, 19:49
Location: Regensburg, Germany

Re: WB_F result bug

Postby H.G.Muller » 17 May 2009, 11:29

Eden 0.0.11 is one of the engnes that, as I recall it, failed to send RESULT claims. I am pretty sure this is not an imagined problem, as why else would I have taken the trouble to add the code to intercept this case?

I agree that in general it is not the task of a GUI to supplement the engine, but this is a situaton that has a sufficiently large impact on the score of the opponent, and a simple-enugh solution, to handle it with care. Penalizing non-compliant engines is one thing, but handing out free points to an opponent is quite another.

This is clearly a situation that can occur only through a protocol violation, and the effects of a protocol violation are in principle undefined. That leaves the GUI the possibility to do whatever it wants. The currently implementd action still seems the least of all evils to me, most useful to the averge user. I could of course mke the behavior dependent on a new command-line option, say /forfeitExitingEngines. This could then even forfeit engines that exit without RESULT claim after checkmating or stalemating.

Note that this is not a private e-mail exchange, and that whatever I post is not meant to only address you, but a general audience who might be less expert than you...
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: WB_F result bug

Postby Guenther Simon » 17 May 2009, 12:40

H.G.Muller wrote:Eden 0.0.11 is one of the engnes that, as I recall it, failed to send RESULT claims. I am pretty sure this is not an imagined problem, as why else would I have taken the trouble to add the code to intercept this case?

I agree that in general it is not the task of a GUI to supplement the engine, but this is a situaton that has a sufficiently large impact on the score of the opponent, and a simple-enugh solution, to handle it with care. Penalizing non-compliant engines is one thing, but handing out free points to an opponent is quite another.

This is clearly a situation that can occur only through a protocol violation, and the effects of a protocol violation are in principle undefined. That leaves the GUI the possibility to do whatever it wants. The currently implementd action still seems the least of all evils to me, most useful to the averge user. I could of course mke the behavior dependent on a new command-line option, say /forfeitExitingEngines. This could then even forfeit engines that exit without RESULT claim after checkmating or stalemating.

Note that this is not a private e-mail exchange, and that whatever I post is not meant to only address you, but a general audience who might be less expert than you...


I have tested Eden_0011 too, but that was already the JA version and it didn't show that behaviour.
There is no need to change the default behaviour in your WB_F for such cases. I can understand your
point, but I don't agree with it and I will change the results always against exiting programs, because
they don't follow a simple rule. There is no sense in using programs anyway which are too buggy,
e.g. cannot finish at least a certain percentage of played games in a normal way.

Something else now: IIRC you fiddled around with ToledoNanoChess and somehow I got the impression
it would be around the strength of MicroMax, or may be my memory was bad(or a different old and much weaker
MicroMax version)?
Either it is because I only use longer time controls as 15min/40 and the slowest available TNN version
a while ago was fixed to depth 6, or it is because it simply plays much stronger at very fast time controls?
Anyhow here it seems at least 500-700 points below Micromax at my tc. Any idea?

Guenther
User avatar
Guenther Simon
 
Posts: 794
Joined: 26 Sep 2004, 19:49
Location: Regensburg, Germany

Re: WB_F result bug

Postby H.G.Muller » 17 May 2009, 14:19

Would you also like a /forfeitExitingEngines flag to forfeit an engine that exits after checkmating the opponent? Or just in a draw position?

About Toledo nanochess:

There are two development branches of micro-Max, with as currently most recent verions micro-Max 4.8 (optimized for best Elo/char ratio, 1968 characters) and micro-Max 1.6 (optimized for smallest size irrespective of playing strength, 1433 characters). On the Chess War rating scale (which is a bit compressed compared to the true ratings) 4.8 has 1882, 1.6 only 1451.

Even micro-Max 1.6 is significatly stronger than Toledo nanochess (in direct confrontation), at equal thinking time. But the version of micro-Max 1.6 that is on my website is set for blitz TC, while the published version of Toledo nanochess (5 ply) is set for standard TC, and thinks 20 times longer. Oscar Toledo likes to avertize that the published version of Toledo nanochess (lightly) beats the published version of micro-Max 1.6, without mentioning that uMax is facing a factor 20 time odds in those tests. In the few equal-time tests I conducted, uMax 1.6 beat Toledo nanochess by about 75%.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: WB_F result bug

Postby Guenther Simon » 17 May 2009, 19:14

H.G.Muller wrote:Would you also like a /forfeitExitingEngines flag to forfeit an engine that exits after checkmating the opponent? Or just in a draw position?


I don't think I need any such flag, because I neither know of such a program, nor shouldn't it be necessary at all, as a mate always ends
a game independently if the mating entity leaves the board or crashes or whatever.


H.G.Muller wrote:About Toledo nanochess:

There are two development branches of micro-Max, with as currently most recent verions micro-Max 4.8 (optimized for best Elo/char ratio, 1968 characters) and micro-Max 1.6 (optimized for smallest size irrespective of playing strength, 1433 characters). On the Chess War rating scale (which is a bit compressed compared to the true ratings) 4.8 has 1882, 1.6 only 1451.

Even micro-Max 1.6 is significatly stronger than Toledo nanochess (in direct confrontation), at equal thinking time. But the version of micro-Max 1.6 that is on my website is set for blitz TC, while the published version of Toledo nanochess (5 ply) is set for standard TC, and thinks 20 times longer. Oscar Toledo likes to avertize that the published version of Toledo nanochess (lightly) beats the published version of micro-Max 1.6, without mentioning that uMax is facing a factor 20 time odds in those tests. In the few equal-time tests I conducted, uMax 1.6 beat Toledo nanochess by about 75%.


Ok, this explains why the 6ply ToledoNC I have still is much weaker than expected.

Guenther
User avatar
Guenther Simon
 
Posts: 794
Joined: 26 Sep 2004, 19:49
Location: Regensburg, Germany


Return to WinBoard development and bugfixing

Who is online

Users browsing this forum: No registered users and 28 guests