Tokens and Symbols in PGN files

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Tokens and Symbols in PGN files

Postby crystalclear » 03 Apr 2012, 12:55

My software could read a sort of limited version of PGN files.
In order to improve its handling of game files, I took a look at the PGN specification and adapted my software in an attempt to be able to parse "Recursive Annotations" without falling over, cope with "Numeric Annotation Glyphs", etc.

I ran into a few little problems, as I may have taken the specification too literally.

In one place it mentions that files are parsed as a sequence of tokens, and that there are various types of token, eg
[ left braket, introducing a tag pair, is a token
( left parenthesis, introducing a recursive annotation, is a token
symbols: an alphanumeric sequence of characters can be a token
* asterix, used as one of the game termination markers. was also defined as a token.

The characters that can occur in token of type "Symbol" were strictly defined. I think the first character had to be alphanumeric and then
symbol continuation characters could include underscores, hyphens and some other marks like maybe #, = and +.

I then ran into a few hiccups with my changed software, eg with 1/2-1/2.
My software wanted to treat it as a symbol 1, a slash, a symbol 2-1, a slash, and a symbol 2.
That was because I had implemented the PGN specification's definition of "Symbol" perhaps too literally.

Later in the specification, it says that a game terminations marker is one of the four symbols: 1-0, 0-1, 1/2-1/2, or *.
However that is in contradiction to previous text that says that the tokens 1-0 and 0-1 are tokens of type "Symbol" but that the token * is a separate type of token (asterix).

-----

Is there a newer version of the PGN specification that clears things up?
Or do people just implement a work-around, eg
1. accepting slashes in a Symbol if it is 1/2-1/2, or
2. handling the symbol sequence 1, 2-1, and 2 specially when separated by slashes to form 1/2-1/2, or
3. ignoring the PGN specification and parsing the string "1/2-1/2" outside of software that conforms to the specification ?

I don't think it matters much. I think it is not difficult to parse a correctly formatted PGN file [ha ha: the fact I am writing this proves that I am having some problems!] and if there are errors in a PGN file there is no great need for error recovery. If there had to be a standard form of error recovery, consistent across applications, then maybe it would matter.

E.g. if someone mistakenly writes 1Nf3 in a PGN file, there doesn't seem to be any rules about whether it should be treated as
a missing dot after a move number, error reported, but accepted if the move number is right, or
if it is an invalid move number (possibly reported and ignored),
or an invalid move (and possibly the game entire game skipped, or even the file skipped or application aborted).

==

Another observation about PGN is that it requires a notation like e8=Q (with an equals sign) instead of Standard Algebraic Notation (as defined by FIDE).

http://en.wikipedia.org/wiki/Algebraic_ ... _promotion
Pawn promotion
When a pawn moves to the last rank and promotes, the piece promoted to is indicated at the end of the move notation, for example: e8Q (promoting to queen). Sometimes an equals sign (=) or parentheses are used: e8=Q or e8(Q), but neither format is a FIDE standard. (An equals sign is also sometimes used to indicate the offer of a draw when written on the scoresheet next to a move, but this is not part of algebraic notation.)[4] In Portable Game Notation (PGN), pawn promotion is always indicated using the equals sign format (e8=Q).
In older books, pawn promotions can be found using a forward slash: e8/Q.


http://www.fide.com/FIDE/handbook/LawsOfChess.pdf
C.12 In the case of the promotion of a pawn, the actual pawn move is indicated, followed immediately by the first letter of the new piece. Examples: d8Q, f8N, b1B, g1R.
crystalclear
 
Posts: 91
Joined: 22 Sep 2011, 14:19

Re: Tokens and Symbols in PGN files

Postby jkrabbenbos » 12 Jun 2012, 08:41

Hi crystalclear,

There is no newer version of the PGN standard available other than the one you are mentioning. There is an extension available that gives information how to add time info and more, especially additions for computer chess. As the PGN standard is from the late 80s, early 90s, it also precedes the formal FIDE algebraic notation standard. As far as I know the basis of the current FIDE rules has been created somewhere during the 90s. It would be a good idea to update the current PGN standard, much in the way that has been done with the PDN (for draughts) standard early this year.

On the translation of the PGN: there are formal specifications for the PGN syntax available which can be used in parser/compiler generators like ANTLR or JavaCC. These solve parts of your issues. I have for example a simple ANTLR grammar for PGN, that I found somewhere on the net (do not know the origins at the moment anymore, have to search again), which does the basic things. I want to extend this grammar into the full PGN with extensions when I find some time for that.

Regards,
Jan
jkrabbenbos
 
Posts: 7
Joined: 26 Oct 2008, 21:39


Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 5 guests