PGN Parsing - what classic Chess rules need to be checked?

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

PGN Parsing - what classic Chess rules need to be checked?

Postby tnesham » 15 Dec 2004, 16:40

I have written a PGN parser some years ago in C, converted to Java, and now I am converting the GUI to SWT(sort of Java hybrid - see Eclipse.org). (I am interested in anyone looking at SWT for some opinions on it). I have already written the board and made the program able to allow multiple players on a network play each other. To do that I also needed to check for illegal moves and report an error message.
With PGN parsing, rule checking has slightly different needs: for example, a pawn move to e4 - I do not need to check if the path to e4 (from e2) is obstructed because I take the pgn's text as fact it is legal. (any disagreement on this?). However, given the pgn text "5. c4", does that mean pawn on c2 to c4? or pawn on c3 to c4? What if there is a pawn on *both* c2 *and* c3? In the latter case, c4 means the pawn on c3 because the pawn on c2 cannot legally move.
That was an instance of my general question: what rules must a pgn parser observe? Have those rules been defined? Is there documentation similar to a flow chart?

tia
tnesham
 

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Uri Blass » 15 Dec 2004, 17:39

I think that you should not assume that the moves in the pgn are legal because they may be illegal.

I already downloaded games with illegal moves from WBEC because one of the engines played an illegal move.

Leo gave comment that the move is illegal inside the pgn but in some cases the comment came only after the illegal move.

I do not know what is the purpose of your pgn parser but if you plan to use it on pgn's that you can download then you should not assume legal moves.

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Peter Fendrich » 15 Dec 2004, 18:00

tnesham wrote: for example, a pawn move to e4 - I do not need to check if the path to e4 (from e2) is obstructed because I take the pgn's text as fact it is legal.
Don't take anything for granted! Uri explained why.
tnesham wrote:However, given the pgn text "5. c4", does that mean pawn on c2 to c4? or pawn on c3 to c4? What if there is a pawn on *both* c2 *and* c3? In the latter case, c4 means the pawn on c3 because the pawn on c2 cannot legally move.
That was an instance of my general question: what rules must a pgn parser observe? Have those rules been defined? Is there documentation similar to a flow chart?

Take a look at the PGN spec: http://www.very-best.de/pgn-spec.htm
It's all there, almost :)

/Peter[/quote]
User avatar
Peter Fendrich
 
Posts: 193
Joined: 26 Sep 2004, 20:28
Location: Sweden

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Dan Honeycutt » 15 Dec 2004, 18:27

I third that. Your parser needs a board and a move generator so it can follow the game and verify the info in the file.

Dan H.
Dan Honeycutt
 
Posts: 167
Joined: 28 Sep 2004, 15:49
Location: Atlanta Georgia, USA

thanks

Postby tnesham » 16 Dec 2004, 03:42

It should be no problem for me to verify the move since I already have methods to check the legality of the move.

I was wondering if performance was a consideration, particularly when determining if the King is in check or determining if a square, between King and Rook, is under attack during a castling move. But I suppose the integrity of the game is priority and, hey, that's what a computer is good for. :wink: [/quote]
tnesham
 

Re: PGN Parsing - what classic Chess rules need to be checke

Postby tnesham » 16 Dec 2004, 03:47

Peter Fendrich wrote:Take a look at the PGN spec: http://www.very-best.de/pgn-spec.htm
It's all there, almost :)

/Peter



Excellent!

thanks
tnesham
 

Re: thanks

Postby Dan Honeycutt » 16 Dec 2004, 05:13

tnesham wrote:I was wondering if performance was a consideration, particularly when determining if the King is in check or determining if a square, between King and Rook, is under attack during a castling move.


No. The slowest routine I can concieve of to make this kind of test would consume negligible time compared to what you spend on file I/O.

Dan H.
Dan Honeycutt
 
Posts: 167
Joined: 28 Sep 2004, 15:49
Location: Atlanta Georgia, USA

Re: thanks

Postby Anonymous » 16 Dec 2004, 14:54

Dan Honeycutt wrote:No. The slowest routine I can concieve of to make this kind of test would consume negligible time compared to what you spend on file I/O.


Dan did you check it? I think, I do have a pretty fast move generator. I use it for testing legal moves. It is not optimal (in sense of efficiency). Parsing PGNs is more or less CPU bound here. The program will parse about 3.5 Mbyte/s. The disk is much faster. While I type this message, I see that the parsing uses 94% of the CPU time. To make it disk bound, it would need to be almost 20 times faster - you'll need pretty efficient algorithms for it. Or in other words, a modern disk reads 60 MB/s. Say ome half move needs 12 bytes (move number and the overhead by headers, which I assume will need no CPU time), this is 5 mio moves/s. How do you check legality? When generating legal moves and comparing: say on average 40 legal moves: you must generate 200 mio moves/s (and must not have any more overhead ...).

Regards,
Dieter
Anonymous
 

Re: thanks

Postby Dan Honeycutt » 16 Dec 2004, 19:29

Dieter B?r?ner wrote:Dan did you check it?


Hi Dieter:

No, and I probably don't appreciate the speed of modern hard drives. Relic that I am I still think in terms of floppies or, even before, cassette tape where I/O is everything.

I'm working on a book builder/PGN reader. To keep it clean and simple I wrote what I consider to be a very slow move generator. It still generates better than 10 million moves/sec. Using 25 moves/position and 50 moves/game I figured in 1 second I could generate enough moves to parse some 8,000 games. So I assumed if my parser is sluggish it wouldn't be the fault of the move generator. When I get it all together I'll have to check - maybe I do need to speed up the generator.

Best
Dan H.
Dan Honeycutt
 
Posts: 167
Joined: 28 Sep 2004, 15:49
Location: Atlanta Georgia, USA

Re: thanks

Postby Sune Fischer » 16 Dec 2004, 22:34

Dieter B?r?ner wrote:
Parsing PGNs is more or less CPU bound here. The program will parse about 3.5 Mbyte/s. The disk is much faster. While I type this message, I see that the parsing uses 94% of the CPU time. To make it disk bound, it would need to be almost 20 times faster - you'll need pretty efficient algorithms for it. Or in other words, a modern disk reads 60 MB/s. Say ome half move needs 12 bytes (move number and the overhead by headers, which I assume will need no CPU time), this is 5 mio moves/s. How do you check legality? When generating legal moves and comparing: say on average 40 legal moves: you must generate 200 mio moves/s (and must not have any more overhead ...).


I don't believe the speed of parsing moves is any sort of issue, usually you are going to use the parsed moves for something else which also takes time, e.g. creating a book.

If you recall I have a quite slow pgn parser (using the string compare method), but even this slow method can parse a 1 GB pgn in about 20 min.

Generating the book will on the other hand take days (if you add everything), so in that context it seems a bit overkill to care about pgn parsing speeds.
I should think this is the same for others, if not days then at least hours?

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Uri Blass » 16 Dec 2004, 22:57

I do not think that generating a book takes a long time but the question is how do you generate the book.

I basically start by copying information about the pgn into RAM(Movei may have a problem with pgn with more than 9,000,000 relevant moves but I see no need for that big pgn that has probably low quality).

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Sune Fischer » 16 Dec 2004, 23:45

Uri Blass wrote:I do not think that generating a book takes a long time but the question is how do you generate the book.


Perhaps there is a way around it, but there is no built in mechanism for it that I know of.
When you add a new move you have to see if it is in the list already, so that's an O(N) search for adding every move. Needless to say this is going to be slowing down a lot as the list grows large.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Uri Blass » 16 Dec 2004, 23:49

I do not see why O(N)

I think that adding a move or to be more exact adding a book entry is
O(logN) because of binary search.

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Anonymous » 16 Dec 2004, 23:55

Sune, you cited all of my talking. Note, that I only commented to Dan, and I doubted that PGN parsing is IO bounded (at least when not using very efficient methods). I did not argue at all, that it is important, to have a fast PGN parser.

To the book creating. It took 12 minutes here, to create an opening book out of a 3 GB pgn (Dann Corbit's "crapbase"). Would have been much faster, if I had 1 GB of RAM. You can calculate yourself - parsing PGN can be a significant amount of such a time (and it is in case of Yace). No, I don't argue, that it is important. However, it is nice, to be able to process PGNs rather fast. I have one chess database application like routine in Yace. Typically to find games with some more complicated constraints, that are not available under the wellknown applications (CB, CA, Scid, Jose). I have to change the source for it (often it takes only 2 minutes). Instead of finding games, the purpuse might also be, to collect some specific statistics. For the PGN I created from Dann Corbit's files (3.4 million games), it typically takes somewhere between 12 and 20 minutes here.

Regards,
Dieter
Anonymous
 

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Sune Fischer » 16 Dec 2004, 23:59

I don't see how.

A linked list can't be indexed efficiently and an array can't be kept ordered.
There is actualy a way to have both in C++ but according to the documentation the container supporting insertion (forgot its name at the moment) can require a re-allocation at any point in time when there is no room for insertion, this will be very expensive too.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Anonymous » 17 Dec 2004, 00:03

Uri Blass wrote:I do not see why O(N)

I think that adding a move or to be more exact adding a book entry is
O(logN) because of binary search.

Uri


Depending on the method used, it might be O(N), O(log N) or O(1). You all will know, that there is a finite number of chess positions. If you had enough table space, to store all positions, it is clear, that only constant time is needed. Even with much less space, practically constant time for each move can be enough (a hash table approach, binary search approach of course will need O(log N) time).

Dieter
Anonymous
 

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Uri Blass » 17 Dec 2004, 00:21

Sune Fischer wrote:I don't see how.

A linked list can't be indexed efficiently and an array can't be kept ordered.
There is actualy a way to have both in C++ but according to the documentation the container supporting insertion (forgot its name at the moment) can require a re-allocation at any point in time when there is no room for insertion, this will be very expensive too.

-S.


I do not add a single move every time but first get all the book moves into an array and later order them in time O(n*logn)

average time per move is O(logn).

Even with adding a single move every time it is possible to have an array of pointers when pointer can in theory point to a small list of positions but the length of the list will usually be small(only 1 or 2 positions).

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Sune Fischer » 17 Dec 2004, 00:23

Dieter B?r?ner wrote:Sune, you cited all of my talking. Note, that I only commented to Dan, and I doubted that PGN parsing is IO bounded (at least when not using very efficient methods). I did not argue at all, that it is important, to have a fast PGN parser.


Then I misunderstood, it sounded to me like you were saying optimize all you can since the disk is not the bottleneck. :)

For the book, I guess you build a binary tree on the keys which should be O(log2(N)), it just seems complicated to me. I generally favor doing things short and simple when it's not of major importance.

I guess I can live with an O(N) since I don't build 3 GB books very often, even the 1 GB I have is utter crap and not worth using. I recall one of these pgns had 1.f4 to be the highest winning percentage, LOL.

The best pgn I've found is the gm2600 from Hyatt's site. When I look at the book afterwards it seems like it has learned some good statistics.

Looking for patterns is not O(N) though, so not much problem there.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Sune Fischer » 17 Dec 2004, 00:31

Uri Blass wrote:I do not add a single move every time but first get all the book moves into an array and later order them in time O(n*logn)

average time per move is O(logn).


The problem is that this is memory inefficient, it needs ~5-10 times as much memory and the post-processing also requires a large structure.

I don't understand what you mean by this:
"Even with adding a single move every time it is possible to have an array of pointers when pointer can in theory point to a small list of positions but the length of the list will usually be small(only 1 or 2 positions)."

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN Parsing - what classic Chess rules need to be checke

Postby Uri Blass » 17 Dec 2004, 00:43

Sune Fischer wrote:
Uri Blass wrote:I do not add a single move every time but first get all the book moves into an array and later order them in time O(n*logn)

average time per move is O(logn).


The problem is that this is memory inefficient, it needs ~5-10 times as much memory and the post-processing also requires a large structure.

I don't understand what you mean by this:
"Even with adding a single move every time it is possible to have an array of pointers when pointer can in theory point to a small list of positions but the length of the list will usually be small(only 1 or 2 positions)."

-S.


I do not need a very big pgn for practical book.

pgn of 50,000 games is enough for book.

Anout the second option I mean that if you plan to have 1,000,000 positions in book you can have an array of 2^20 pointers when every pointer points to positions that the last 20 bit of the hash signature are the same.

When you have a position you first look at a place in your array based on
the last 20 bits of the hash signature and there you can find the relevant information for book move(you may need to look at some positions but probably only small number of positions).

Uri
Last edited by Uri Blass on 19 Dec 2004, 21:06, edited 1 time in total.
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Next

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 14 guests