PGN Parsing - what classic Chess rules need to be checked?

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Peter Fendrich » 17 Dec 2004, 13:21

My method:
Arrange an array representing all possible moves on a board:

uint array[64][64]; // indexed by array[from][to]

Each parsed position gets a number (kind of position-id) by starting from 0 and adding 1 for each new position.
Place the position-id f?r all generated moves there. Now compare the PGN-moves current position-id with the one in the array. When position_id reaches UINT_MAX you have to start over from 0 and zero all the table entries. Of course uint64 can be used instead....
One have to be careful with some special moves. For instance you don't need to know the promotion piece - just that it is a promotion.
/Peter
User avatar
Peter Fendrich
 
Posts: 193
Joined: 26 Sep 2004, 20:28
Location: Sweden

Re: thanks

Postby Alessandro Scotti » 17 Dec 2004, 14:12

Dieter B?r?ner wrote:Dan did you check it? I think, I do have a pretty fast move generator. I use it for testing legal moves. It is not optimal (in sense of efficiency). Parsing PGNs is more or less CPU bound here. The program will parse about 3.5 Mbyte/s. The disk is much faster. While I type this message, I see that the parsing uses 94% of the CPU time.


Hi Dieter,
I have checked it with Kiwi and found that my figures are almost identical to yours. In my case, I spend most time in the function that interprets a SAN move, because that function requires generating the list of valid moves for the current position (for verification and to resolve ambiguous moves).
Because generating valid moves is so slow in my engine, I've had an idea: to parse a SAN move, I generate the list of pseudo-legal moves, and then only validate those that hit the destination square specified in the SAN move. On a test machine, reading gm2600.pgn passed from 20.5 seconds with the old approach to 7.0 second with the new one... considering that both contain a 3-4 seconds fixed overhead I would say this idea has made parsing SAN moves about 4x faster. Maybe a "targeted" move generator will be even (a bit) faster, very similar to that I'm using for SEE...
Adding to book is not so expensive in Kiwi, on the same machine generating a 30 plies book from gm2600.pgn takes now 9.7 seconds.
User avatar
Alessandro Scotti
 
Posts: 306
Joined: 20 Nov 2004, 00:10
Location: Rome, Italy

Re: PGN Parsing - what classic Chess rules need to be checke

Postby tnesham » 18 Dec 2004, 17:44

Peter Fendrich wrote: Don't take anything for granted!


All this discussion on perfomance is interesting, but I have to challenge the "don't take anything for granted" rule. :shock: If the PGN is from a ChessBase program, for example, why wouldn't I take it for granted that their PGN is clean? I'd be willing to wager that their PGN is cleaner than my parser! I mean, they make their living doing that stuff, and I, well, don't. :(
If the user wants to modify their PGN, that's their risk. GIGO! :)

In databases there is the concept of optimistic versus pessimistic locking. Consider optimistic versus pessimistic PGN parsing, I will be happy with an optimistic parser. If I may define that as being one that takes a move as PGN text and assumes it is legal, then what is remaining is fairly trivial.
tnesham
 

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Reinhard Scharnagl » 18 Dec 2004, 17:55

Hi all,

some user want to enter PGN which is not thought to be PGN. So Smirf is able to read even German language piece letters and a lot of strange notation forms. Thus the user can use Smirf to standardize a wide range of game notations, when writing out the input (file or clipboard) game again as a PGN.

Reinhard.
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Anonymous » 18 Dec 2004, 18:27

tnesham wrote:If the PGN is from a ChessBase program, for example, why wouldn't I take it for granted that their PGN is clean?


I wouldn't bet on it. I remember some issues with quoting and the time controls in the headers (2" may mean 2 seconds, but the quote must be escaped). I have found other things as well (which I cannot find anymore easily, because I have workarounds in my code now).

Also, the issue with overspecified moves can happen with CB created PGNs.

Scid also had some subtle issues with escaping.

Regards,
Dieter
Anonymous
 

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Anonymous » 18 Dec 2004, 18:30

Peter Fendrich wrote:My method:
...
/Peter


Peter, I read this several times, but I don't get it :-( Is it only me? Can you say it in other words?

Cheers,
Dieter
Anonymous
 

Re: thanks

Postby Anonymous » 18 Dec 2004, 18:35

Alessandro Scotti wrote:
Because generating valid moves is so slow in my engine, I've had an idea: to parse a SAN move, I generate the list of pseudo-legal moves, and then only validate those that hit the destination square specified in the SAN move.


Yes, this is a good idea. I actually also thought of it, but nor really for PGN parsing but more important for validating moves from the hash table. This really shows up in profiles of Yace (which most often will have a move in he HT to validate, in case of upper bound and lower bound.) There one could expect a speedup of a factor like you saw at least. Unfortunately it is quite some work. It will duplicate quite some code (which is prone to errors) or will need quite some thinking to avoid the duplication (and still be efficient).

Cheers,
Dieter
Anonymous
 

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Sune Fischer » 18 Dec 2004, 18:50

tnesham wrote:All this discussion on perfomance is interesting, but I have to challenge the "don't take anything for granted" rule. :shock: If the PGN is from a ChessBase program, for example, why wouldn't I take it for granted that their PGN is clean? I'd be willing to wager that their PGN is cleaner than my parser! I mean, they make their living doing that stuff, and I, well, don't. :(
If the user wants to modify their PGN, that's their risk. GIGO! :)

In databases there is the concept of optimistic versus pessimistic locking. Consider optimistic versus pessimistic PGN parsing, I will be happy with an optimistic parser. If I may define that as being one that takes a move as PGN text and assumes it is legal, then what is remaining is fairly trivial.


I share your view, although I can see where the others are comming from, in programming you often have to consider the worst case senario. In this case however worst case just means you drop a game and proceed to the next.

After having tried a number of pgns I've become even more convinced that an optimistic parser is generally quite sufficient. Typicly it errs on 3-400 games in a million, that's completely acceptable to me.

My advice would be, that you shouldn't try and fix a lot of things until you are sure these things actually need to be fixed. It's better to keep things simple and not to assume the worst in this case, IMO.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: thanks

Postby Sune Fischer » 18 Dec 2004, 19:02

Dieter B?r?ner wrote:Yes, this is a good idea. I actually also thought of it, but nor really for PGN parsing but more important for validating moves from the hash table. This really shows up in profiles of Yace (which most often will have a move in he HT to validate, in case of upper bound and lower bound.) There one could expect a speedup of a factor like you saw at least. Unfortunately it is quite some work. It will duplicate quite some code (which is prone to errors) or will need quite some thinking to avoid the duplication (and still be efficient).


Do you generate moves in order to validate the hashmove?

This would seem a bit odd to me, since one of the ideas is that you avoid generating moves in case of a hashmove.

In case of pgn-parsing, I think it should be possible to do without any move generation at all, in theory.

It's not practical in my implementation, but for those that care about pgn reading speeds it might be worth thinking about? :)

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: thanks

Postby Anonymous » 18 Dec 2004, 19:38

> Sune
> Do you generate moves in order to validate the hashmove?

Yes, in the case of cutoffs.

>This would seem a bit odd to me, since one of the ideas is that you
> avoid generating moves in case of a hashmove.

I don't use this idea. I of course thought about it, but decided against it. I estimated (after consulting profile outputs)it will at most speed up things by low digit percent. And it would make an already complicated function quite a bit more complicated. For the validation process things are a bit different. The complicated search function would stay untouched. Of course writing this special purpose move generator is not nice.

Perhaps I should try a more pragmatic approach, for validating hash moves (that is not correct in 100%, but only in say 98% of the cases - real collisions are very rare anyway, and they even don't matter much). I am not sure anymore, if Yace can crash, when an illegal move is in the PV (Yace adds the cutoff move from the HT to the PV). Sure, it can be handled at display time, too. I suspect, I am doing it twice already.

Another minor nuisance with my design here: in HTs and PVs I store abbreviated moves only, that use 15 bits. From this, I can always generate the full move (that has some more flags, for castling, capturing, etc.). For this purpose, I am using the move generator. Makemove really needs the full move ... As you can see, this makes the optimization "don't generate moves soon, just try hash move first" not too trivial.

>In case of pgn-parsing, I think it should be possible to do without any
>move generation at all, in theory.

Perhaps yes, but not easy. Think of the case of to pieces of the same kind, that both can move to the same target. But one is pinned to the king. It sure can be done, but adding all this logic (and getting it perfectly correct) does not look like a nice job.

BTW. I have to agree with your "optimistic view" for the PGN cleanness, after rethinking it. It is like I started, too. Try to get the specs right, and then fix things, I found with various PGNs, or that got reported. Using an externaly tool for this fixing step would or course be appropriate, too.

Cheers,
Dieter
Anonymous
 

Re: thanks

Postby Sune Fischer » 18 Dec 2004, 20:19

Dieter B?r?ner wrote:I don't use this idea. I of course thought about it, but decided against it. I estimated (after consulting profile outputs)it will at most speed up things by low digit percent. And it would make an already complicated function quite a bit more complicated. For the validation process things are a bit different. The complicated search function would stay untouched. Of course writing this special purpose move generator is not nice.


In my case I decided to do it because otherwise there wouldn't be much point in having an incremental move generator. If the hashmove is not a capture then the whole bunch needs generating. Perhaps only a selected part (e.g. only moves from the e3 square) are required but then it gets complicated to avoid generating these moves again.

It does gets complicated, but I figured it would be nice to have full validity checker for other cases too, like the PV and especially the killers.

The other alternative is to generate the whole bunch at the beginning, granted this is simpler. I think it is not faster though, for a bitboarder.

Perhaps I should try a more pragmatic approach, for validating hash moves (that is not correct in 100%, but only in say 98% of the cases - real collisions are very rare anyway, and they even don't matter much). I am not sure anymore, if Yace can crash, when an illegal move is in the PV (Yace adds the cutoff move from the HT to the PV). Sure, it can be handled at display time, too. I suspect, I am doing it twice already.


Real hash collisions are quite rare, but slightly more frequent for me because I use only 56 bits or something like that. It should produce a real collsion every few hours. It doesn't matter since the move will be cought invalid, but an invalid score can be sent back down the tree.
I did tests, as did others incl. Hyatt, and it doesn't seem like a problem. The 25% smaller hash entries more than makes up for it.

As you can see, this makes the optimization "don't generate moves soon, just try hash move first" not too trivial.


I'm compressing to 16 bits, but the last four bits are special. Instead of four 1-bits type of flag they contain a value from 0-15 that contains information about castling or promotion.

>In case of pgn-parsing, I think it should be possible to do without any
>move generation at all, in theory.

Perhaps yes, but not easy. Think of the case of to pieces of the same kind, that both can move to the same target. But one is pinned to the king. It sure can be done, but adding all this logic (and getting it perfectly correct) does not look like a nice job.



Yes you have to generate moves "backwards" using the to-square.
In Nf5 go to f5 and look at the knights attacks from f5. How many knights of the right color can be captured by a knight from f5 (this is often how attacks are done with bitboards). If we attack two knights then see if one of them is pinned etc...

I think it is doable actually.

BTW. I have to agree with your "optimistic view" for the PGN cleanness, after rethinking it. It is like I started, too. Try to get the specs right, and then fix things, I found with various PGNs, or that got reported. Using an externaly tool for this fixing step would or course be appropriate, too.


One could (should?) listen to more experience pgn parsers, but perhaps some of the bugs you found in scid and others haven't been corrected.

I'll probably start worrying when I get complaints in the mail, so far I've not run into anything major myself.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: thanks

Postby Alessandro Scotti » 18 Dec 2004, 23:40

Sune Fischer wrote:Yes you have to generate moves "backwards" using the to-square.
In Nf5 go to f5 and look at the knights attacks from f5. How many knights of the right color can be captured by a knight from f5 (this is often how attacks are done with bitboards). If we attack two knights then see if one of them is pinned etc...


With bitboards, one usually has a enemyOrEmpty bitboard for masking off trivially illegal moves that target friendly pieces. I've got most of the benefits with just one line of code by and-ing that with the target square (if set). However, that wasn't much faster than generating the list of all pseudo legal moves and iterating to remove unwanted entries, I think the overall gain was less than 4% (edited, I originally wrote 1% which isn't correct) or so...

BTW in Kiwi I try to keep PGN stuff to a minimum because I don't think the PGN parser is so important in a chess engine. For a chess GUI or similar application of course having a robust PGN parser is much more important.
Last edited by Alessandro Scotti on 19 Dec 2004, 17:11, edited 1 time in total.
User avatar
Alessandro Scotti
 
Posts: 306
Joined: 20 Nov 2004, 00:10
Location: Rome, Italy

Re: thanks

Postby Sune Fischer » 19 Dec 2004, 00:42

Alessandro Scotti wrote:With bitboards, one usually has a enemyOrEmpty bitboard for masking off trivially illegal moves that target friendly pieces. I've got most of the benefits with just one line of code by and-ing that with the target square (if set). However, that wasn't much faster than generating the list of all pseudo legal moves and iterating to remove unwanted entries, I think the overall gain was less than 1% or so...


It should be "much" faster than generating the whole list of moves and then filtering out those that don't hit the to-square. By doing it backwards you generate directly the relevant moves that involve the to-square and nothing else. I'd be surprised if it wasn't ~5 times faster..?

Further optimizations are even possible, particularly if you can assume valid SAN notation. I.e. Qd5 is obviously Qd2d5 if there is only one queen and it's on d2 and the move is known to be legal.

Or at least you only have to scan in the direction d5-d2 and check for pinned.

BTW in Kiwi I try to keep PGN stuff to a minimum because I don't think the PGN parser is so important in a chess engine. For a chess GUI or similar application of course having a robust PGN parser is much more important.


Yes good point, I agree. Re-use code whenever possible, and most especially when speed is not essential.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Peter Fendrich » 19 Dec 2004, 20:17

Dieter B?r?ner wrote:Peter, I read this several times, but I don't get it :-( Is it only me? Can you say it in other words?

Cheers,
Dieter
Ok, I suppose this is a simple PGN parsing:
Code: Select all
while (grab_next_PGN_move(*currmove)){
    if (IsMoveLegal(*currmove)) {
         Update_position(*currmove);  // Make the move and whatever
         Position_id++;  //I will explain this below     
    }
    else
         errorHandling();
}

The IsMoveLegal() is often done by generating all moves and search for currmove among the generated. It's here my method comes in place.
Instead of looping through the list of moves you check in the Array[from][to] if currmove is among the generated. The movegenerator sets Array[from][to]=Position_id for all generated moves.

Does all this make sense ?

/Peter
User avatar
Peter Fendrich
 
Posts: 193
Joined: 26 Sep 2004, 20:28
Location: Sweden

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Anonymous » 20 Dec 2004, 13:03

Peter Fendrich wrote:Does all this make sense ?

/Peter


Yes, thanks! I was confused, because of the other part of the thread, that was discussing efficiency of adding moves to a book, and thought you were referring to it.

Cheers,
Dieter
Anonymous
 

Previous

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 11 guests