Page 1 of 2

PGN Parsing - what classic Chess rules need to be checked?

PostPosted: 15 Dec 2004, 16:40
by tnesham
I have written a PGN parser some years ago in C, converted to Java, and now I am converting the GUI to SWT(sort of Java hybrid - see Eclipse.org). (I am interested in anyone looking at SWT for some opinions on it). I have already written the board and made the program able to allow multiple players on a network play each other. To do that I also needed to check for illegal moves and report an error message.
With PGN parsing, rule checking has slightly different needs: for example, a pawn move to e4 - I do not need to check if the path to e4 (from e2) is obstructed because I take the pgn's text as fact it is legal. (any disagreement on this?). However, given the pgn text "5. c4", does that mean pawn on c2 to c4? or pawn on c3 to c4? What if there is a pawn on *both* c2 *and* c3? In the latter case, c4 means the pawn on c3 because the pawn on c2 cannot legally move.
That was an instance of my general question: what rules must a pgn parser observe? Have those rules been defined? Is there documentation similar to a flow chart?

tia

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 15 Dec 2004, 17:39
by Uri Blass
I think that you should not assume that the moves in the pgn are legal because they may be illegal.

I already downloaded games with illegal moves from WBEC because one of the engines played an illegal move.

Leo gave comment that the move is illegal inside the pgn but in some cases the comment came only after the illegal move.

I do not know what is the purpose of your pgn parser but if you plan to use it on pgn's that you can download then you should not assume legal moves.

Uri

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 15 Dec 2004, 18:00
by Peter Fendrich
tnesham wrote: for example, a pawn move to e4 - I do not need to check if the path to e4 (from e2) is obstructed because I take the pgn's text as fact it is legal.
Don't take anything for granted! Uri explained why.
tnesham wrote:However, given the pgn text "5. c4", does that mean pawn on c2 to c4? or pawn on c3 to c4? What if there is a pawn on *both* c2 *and* c3? In the latter case, c4 means the pawn on c3 because the pawn on c2 cannot legally move.
That was an instance of my general question: what rules must a pgn parser observe? Have those rules been defined? Is there documentation similar to a flow chart?

Take a look at the PGN spec: http://www.very-best.de/pgn-spec.htm
It's all there, almost :)

/Peter[/quote]

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 15 Dec 2004, 18:27
by Dan Honeycutt
I third that. Your parser needs a board and a move generator so it can follow the game and verify the info in the file.

Dan H.

thanks

PostPosted: 16 Dec 2004, 03:42
by tnesham
It should be no problem for me to verify the move since I already have methods to check the legality of the move.

I was wondering if performance was a consideration, particularly when determining if the King is in check or determining if a square, between King and Rook, is under attack during a castling move. But I suppose the integrity of the game is priority and, hey, that's what a computer is good for. :wink: [/quote]

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 16 Dec 2004, 03:47
by tnesham
Peter Fendrich wrote:Take a look at the PGN spec: http://www.very-best.de/pgn-spec.htm
It's all there, almost :)

/Peter



Excellent!

thanks

Re: thanks

PostPosted: 16 Dec 2004, 05:13
by Dan Honeycutt
tnesham wrote:I was wondering if performance was a consideration, particularly when determining if the King is in check or determining if a square, between King and Rook, is under attack during a castling move.


No. The slowest routine I can concieve of to make this kind of test would consume negligible time compared to what you spend on file I/O.

Dan H.

Re: thanks

PostPosted: 16 Dec 2004, 14:54
by Anonymous
Dan Honeycutt wrote:No. The slowest routine I can concieve of to make this kind of test would consume negligible time compared to what you spend on file I/O.


Dan did you check it? I think, I do have a pretty fast move generator. I use it for testing legal moves. It is not optimal (in sense of efficiency). Parsing PGNs is more or less CPU bound here. The program will parse about 3.5 Mbyte/s. The disk is much faster. While I type this message, I see that the parsing uses 94% of the CPU time. To make it disk bound, it would need to be almost 20 times faster - you'll need pretty efficient algorithms for it. Or in other words, a modern disk reads 60 MB/s. Say ome half move needs 12 bytes (move number and the overhead by headers, which I assume will need no CPU time), this is 5 mio moves/s. How do you check legality? When generating legal moves and comparing: say on average 40 legal moves: you must generate 200 mio moves/s (and must not have any more overhead ...).

Regards,
Dieter

Re: thanks

PostPosted: 16 Dec 2004, 19:29
by Dan Honeycutt
Dieter B?r?ner wrote:Dan did you check it?


Hi Dieter:

No, and I probably don't appreciate the speed of modern hard drives. Relic that I am I still think in terms of floppies or, even before, cassette tape where I/O is everything.

I'm working on a book builder/PGN reader. To keep it clean and simple I wrote what I consider to be a very slow move generator. It still generates better than 10 million moves/sec. Using 25 moves/position and 50 moves/game I figured in 1 second I could generate enough moves to parse some 8,000 games. So I assumed if my parser is sluggish it wouldn't be the fault of the move generator. When I get it all together I'll have to check - maybe I do need to speed up the generator.

Best
Dan H.

Re: thanks

PostPosted: 16 Dec 2004, 22:34
by Sune Fischer
Dieter B?r?ner wrote:
Parsing PGNs is more or less CPU bound here. The program will parse about 3.5 Mbyte/s. The disk is much faster. While I type this message, I see that the parsing uses 94% of the CPU time. To make it disk bound, it would need to be almost 20 times faster - you'll need pretty efficient algorithms for it. Or in other words, a modern disk reads 60 MB/s. Say ome half move needs 12 bytes (move number and the overhead by headers, which I assume will need no CPU time), this is 5 mio moves/s. How do you check legality? When generating legal moves and comparing: say on average 40 legal moves: you must generate 200 mio moves/s (and must not have any more overhead ...).


I don't believe the speed of parsing moves is any sort of issue, usually you are going to use the parsed moves for something else which also takes time, e.g. creating a book.

If you recall I have a quite slow pgn parser (using the string compare method), but even this slow method can parse a 1 GB pgn in about 20 min.

Generating the book will on the other hand take days (if you add everything), so in that context it seems a bit overkill to care about pgn parsing speeds.
I should think this is the same for others, if not days then at least hours?

-S.

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 16 Dec 2004, 22:57
by Uri Blass
I do not think that generating a book takes a long time but the question is how do you generate the book.

I basically start by copying information about the pgn into RAM(Movei may have a problem with pgn with more than 9,000,000 relevant moves but I see no need for that big pgn that has probably low quality).

Uri

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 16 Dec 2004, 23:45
by Sune Fischer
Uri Blass wrote:I do not think that generating a book takes a long time but the question is how do you generate the book.


Perhaps there is a way around it, but there is no built in mechanism for it that I know of.
When you add a new move you have to see if it is in the list already, so that's an O(N) search for adding every move. Needless to say this is going to be slowing down a lot as the list grows large.

-S.

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 16 Dec 2004, 23:49
by Uri Blass
I do not see why O(N)

I think that adding a move or to be more exact adding a book entry is
O(logN) because of binary search.

Uri

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 16 Dec 2004, 23:55
by Anonymous
Sune, you cited all of my talking. Note, that I only commented to Dan, and I doubted that PGN parsing is IO bounded (at least when not using very efficient methods). I did not argue at all, that it is important, to have a fast PGN parser.

To the book creating. It took 12 minutes here, to create an opening book out of a 3 GB pgn (Dann Corbit's "crapbase"). Would have been much faster, if I had 1 GB of RAM. You can calculate yourself - parsing PGN can be a significant amount of such a time (and it is in case of Yace). No, I don't argue, that it is important. However, it is nice, to be able to process PGNs rather fast. I have one chess database application like routine in Yace. Typically to find games with some more complicated constraints, that are not available under the wellknown applications (CB, CA, Scid, Jose). I have to change the source for it (often it takes only 2 minutes). Instead of finding games, the purpuse might also be, to collect some specific statistics. For the PGN I created from Dann Corbit's files (3.4 million games), it typically takes somewhere between 12 and 20 minutes here.

Regards,
Dieter

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 16 Dec 2004, 23:59
by Sune Fischer
I don't see how.

A linked list can't be indexed efficiently and an array can't be kept ordered.
There is actualy a way to have both in C++ but according to the documentation the container supporting insertion (forgot its name at the moment) can require a re-allocation at any point in time when there is no room for insertion, this will be very expensive too.

-S.

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 17 Dec 2004, 00:03
by Anonymous
Uri Blass wrote:I do not see why O(N)

I think that adding a move or to be more exact adding a book entry is
O(logN) because of binary search.

Uri


Depending on the method used, it might be O(N), O(log N) or O(1). You all will know, that there is a finite number of chess positions. If you had enough table space, to store all positions, it is clear, that only constant time is needed. Even with much less space, practically constant time for each move can be enough (a hash table approach, binary search approach of course will need O(log N) time).

Dieter

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 17 Dec 2004, 00:21
by Uri Blass
Sune Fischer wrote:I don't see how.

A linked list can't be indexed efficiently and an array can't be kept ordered.
There is actualy a way to have both in C++ but according to the documentation the container supporting insertion (forgot its name at the moment) can require a re-allocation at any point in time when there is no room for insertion, this will be very expensive too.

-S.


I do not add a single move every time but first get all the book moves into an array and later order them in time O(n*logn)

average time per move is O(logn).

Even with adding a single move every time it is possible to have an array of pointers when pointer can in theory point to a small list of positions but the length of the list will usually be small(only 1 or 2 positions).

Uri

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 17 Dec 2004, 00:23
by Sune Fischer
Dieter B?r?ner wrote:Sune, you cited all of my talking. Note, that I only commented to Dan, and I doubted that PGN parsing is IO bounded (at least when not using very efficient methods). I did not argue at all, that it is important, to have a fast PGN parser.


Then I misunderstood, it sounded to me like you were saying optimize all you can since the disk is not the bottleneck. :)

For the book, I guess you build a binary tree on the keys which should be O(log2(N)), it just seems complicated to me. I generally favor doing things short and simple when it's not of major importance.

I guess I can live with an O(N) since I don't build 3 GB books very often, even the 1 GB I have is utter crap and not worth using. I recall one of these pgns had 1.f4 to be the highest winning percentage, LOL.

The best pgn I've found is the gm2600 from Hyatt's site. When I look at the book afterwards it seems like it has learned some good statistics.

Looking for patterns is not O(N) though, so not much problem there.

-S.

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 17 Dec 2004, 00:31
by Sune Fischer
Uri Blass wrote:I do not add a single move every time but first get all the book moves into an array and later order them in time O(n*logn)

average time per move is O(logn).


The problem is that this is memory inefficient, it needs ~5-10 times as much memory and the post-processing also requires a large structure.

I don't understand what you mean by this:
"Even with adding a single move every time it is possible to have an array of pointers when pointer can in theory point to a small list of positions but the length of the list will usually be small(only 1 or 2 positions)."

-S.

Re: PGN Parsing - what classic Chess rules need to be checke

PostPosted: 17 Dec 2004, 00:43
by Uri Blass
Sune Fischer wrote:
Uri Blass wrote:I do not add a single move every time but first get all the book moves into an array and later order them in time O(n*logn)

average time per move is O(logn).


The problem is that this is memory inefficient, it needs ~5-10 times as much memory and the post-processing also requires a large structure.

I don't understand what you mean by this:
"Even with adding a single move every time it is possible to have an array of pointers when pointer can in theory point to a small list of positions but the length of the list will usually be small(only 1 or 2 positions)."

-S.


I do not need a very big pgn for practical book.

pgn of 50,000 games is enough for book.

Anout the second option I mean that if you plan to have 1,000,000 positions in book you can have an array of 2^20 pointers when every pointer points to positions that the last 20 bit of the hash signature are the same.

When you have a position you first look at a place in your array based on
the last 20 bits of the hash signature and there you can find the relevant information for book move(you may need to look at some positions but probably only small number of positions).

Uri