PGN-reading code

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

PGN-reading code

Postby Laurens Winkelhagen » 01 Dec 2004, 15:21

Hi all,

I like programming my chess engine, trying to make it better and such, but I dislike some aspects of it. Probably because I'm not well versed in the routines involved... it's (file) parsing.

More specifically, I would like to parse a PGN-file.I already made a function to parse SAN-notation, but I'm stuck with the PGN-file itself. Problems seem to arise when I think about moves going from one line onto the other, hacked in half by a new line. (does that occur?)

So my question is: does anyone have some easy to understand code to read PGN-files? I tried looking at crafty, but I'm really looking for something more 'pedagogical'.

Thanx, Laurens.
Laurens Winkelhagen
 

Re: PGN-reading code

Postby Uri Blass » 02 Dec 2004, 13:52

Laurens Winkelhagen wrote:Hi all,

I like programming my chess engine, trying to make it better and such, but I dislike some aspects of it. Probably because I'm not well versed in the routines involved... it's (file) parsing.

More specifically, I would like to parse a PGN-file.I already made a function to parse SAN-notation, but I'm stuck with the PGN-file itself. Problems seem to arise when I think about moves going from one line onto the other, hacked in half by a new line. (does that occur?)

So my question is: does anyone have some easy to understand code to read PGN-files? I tried looking at crafty, but I'm really looking for something more 'pedagogical'.

Thanx, Laurens.


Movei's reading pgn code is based on Crafty's code and I understood from Bob Hyatt that there is no problem with it.

I could not copy and paste every thing in the relevant code even if I wanted to do it because Crafty has different structure but I copied and paste some things only to read pgn file and my tests showed that it works at list for clean pgn file and I fix it to have no problem also in some common cases of pgn that is not clean.

My code is still ugly and I plan to post it after I change some things in it to do it more easy to read.

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: PGN-reading code

Postby Jim Ablett » 02 Dec 2004, 14:58

'Chessterfield' uses some nice pgn handling routines.
Download source here >

http://home.datacomm.ch/m.luescher/
___________________________
http://jimablett.net63.net/
Jim Ablett
 
Posts: 721
Joined: 27 Sep 2004, 10:39
Location: Essex, England

Re: PGN-reading code

Postby Sune Fischer » 02 Dec 2004, 17:52

Laurens Winkelhagen wrote:Hi all,

I like programming my chess engine, trying to make it better and such, but I dislike some aspects of it. Probably because I'm not well versed in the routines involved... it's (file) parsing.

More specifically, I would like to parse a PGN-file.I already made a function to parse SAN-notation, but I'm stuck with the PGN-file itself. Problems seem to arise when I think about moves going from one line onto the other, hacked in half by a new line. (does that occur?)

So my question is: does anyone have some easy to understand code to read PGN-files? I tried looking at crafty, but I'm really looking for something more 'pedagogical'.

Thanx, Laurens.


Hi Laurens,

I'm working on this PGN parsing too, it's not easy.

I process one character at a time, if it's a non-move token I just continue to the next.

While reading these characters I use a small buffer to collect them into strings, the string is reset every time there is one of these non-move tokens.
Just before the string is reset it is tested to see if it has collected a move.

It took me about a half a day to get that working, then it could dump out the raw SAN moves to a tmp file with headers and comments removed.

To parse the SAN move I compare the string with the SAN version of all the legal moves. This was the simplest I could think of, since I already have the SAN output utilities working. It doesn't seem as slow as I initially thought it would be, it reads about 1500 games a second on a 2 gig XP.

To recognize a new game I look for the result strings "1-0", "0-1" and "1/2-1/2".

I think the real challenge is how to recover from parse errors, currently it cannot recover from all types, for instance a missing closing brace will cause it to see the remaining file as one big comment.

There is still some robustness to fiddle with to make it handle these things better, but right now it seems to eat about 99.9% of all games if they are well behaved.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN-reading code

Postby Pallav Nawani » 02 Dec 2004, 20:12

Sune Fischer wrote:I think the real challenge is how to recover from parse errors, currently it cannot recover from all types, for instance a missing closing brace will cause it to see the remaining file as one big comment.

There is still some robustness to fiddle with to make it handle these things better, but right now it seems to eat about 99.9% of all games if they are well behaved.

-S.


That one is going to be tough to recover from. Probably the best you can do is to check (when you're eating comments) that result token is found, or the pgn tags are found (the '[]' things).

If you can do 99.9% that's very good, just use pgn-extract to fix the pgn for the rest!

Pallav
User avatar
Pallav Nawani
 
Posts: 147
Joined: 26 Sep 2004, 20:00
Location: Dehradun, India

Re: PGN-reading code

Postby Sune Fischer » 02 Dec 2004, 20:42

Yes that's the way to eat comments I think, to loop until the number of end-braces equals the number of start-braces, also to remove nested comments.

The question is when one should become suspicios of a missing token, when the comment block is 1000 chars, or 2000 chars or...?
Some games contain rather lengthy analysis so it's not quite obvious to me.
Of course you right that if a new header is encountered that's a good sign.

About the 99.9% success rate, I ran it on WBEC's pgn with 24855 games.
Quite clean pgn I must say, it seems some of the errors have already been found but are left in there.

Also a few of the games are annotated without a space between the number and white's move, i.e. "25.Bg5" instead of "25. Bg5".
I believe that isn't correct according to the protocol but it's not hard parse either.

Code: Select all
error parsing game 81 0000 h4xf6 2r5/2P1n1kp/3R1pp1/4p3/R6P/6P1/5PK1/1r6 w - -
{Illegal move Black wins} 0-1
error parsing game 254 0000 c4c2 8/p6P/1p3k2/8/2P1p3/PP5Q/1K4r1/6r1 w - -
{Black wins because a illegal move of white} 0-1
error parsing game 2372 0000 Ngf3 r1bqk2r/ppp2ppp/2n1pn2/3p4/1bPP4/4P3/PPQN1PPP/R1B1KBNR w KQkq -
error parsing game 13294 0000 O-O 2r1k2r/p2qn1pp/1pnBp3/4P3/P3N3/3Q4/5PPP/R1R2K2 b - -
{Black wins on time, illegal move (O-O) by Francesca!} 1-0
error parsing game 16773 0000 Kd7 8/8/4k3/3p4/K5Q1/2P5/1p1n4/8 b - -
{Black wins on time, illegal move by black} 1-0
error parsing game 17410 0000 Kg8xf7 6k1/5pp1/7p/1R1p4/2pPR1P1/8/1P3PPK/r7 b - -
{Black wins on time, illegal move by Chispa} 1-0
error parsing game 18145 0000 Rh8 3R4/8/8/8/8/2r4k/1p3K2/7n w - -
{White wins on time, illegal move by Tao (dont know about minor promotion!)} 0-1
error parsing game 23817 0000 Ka5a6 5b2/8/8/kpP1p1p1/4Pp1p/K2N1P1P/6P1/8 w - -
{Black wins on time, illegal move by white!} 0-1
error parsing game 24153 0000 a8a8 8/8/2k2r2/4Q3/P1p4p/K1P4P/5r2/8 b - -
{Draw by ...(3), a8a8?? set to 1-0} 1-0


-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN-reading code

Postby Anonymous » 02 Dec 2004, 22:10

Mainly to Sune's idea: I think, comparing move strings is not a good idea. It is too fragile and will fail with man hand written PGNs as well as with machine written ones (in rare cases). My suggestion would be to write a more general move parser, that will read about any thinkable move format (including long algebraic notation and coordinate notation). For example with the following idea. Say you have already seperated the move string. Start on the end. Skip any !s and ?s from the end. Look for "+" and "#" at the end (and remember it). Now if the last char is a letter, it must be a promotion piece. Support capital and small letters for this. If it is a promotion square, skip a possible preceding "=".

Now assume for a moment long algebraic notation.

piece-letter; from-col; from-row; - or x; to-col; to-row.

Parse it from the end. Assume everything is optional besides the to-square. Use (for example) variables:

move_is_check, move_is_mate, promotion_piece, to_row, to_col, move_is_capture, from_row, from_col, moving_piece. Initialize all to "not initialized" (some unused value like -1, for moving_piece pawn). Assign the parts, that you identify from the end.

Now, generate all legal moves, and calculate the same vars for each legal move. If each initialized var from the above is the same as in a legal move, remember that move, and also increment a counter "move_found". If, after you tried any legal move like this, counter is one, you have a proper move. If it is zero - nothing fits. If > 1, it is ambigous. If from col is "b", try the b as (wrong) piece letter for bishop, and see whether it is unambigous now, or not. castling needs specific treatment, of course. Support small and capital letter O and digit 0 (perhaps even mixed might be easiest to code). I have seen "PGN", where the "-" in the castling was some other character code (perhaps some special hyphen char). One could not distinguish it visuably.

To identify move strings, consider them as white space seperated words (C isspace()). At the start might be a number, like 1.e4 or 1...e5 (there is not always a space between the dot and the move). So, you have to skip a possible move number from the front.

When a word is not a legal move, just skip the word, and take the next. This will skip things like +-, $9 and included comments without {}.

This is obviously just for the move part of the PGN, with comments already considered (and in a first iteration variation perhaps treated as comments).

Open PGN files in binary mode. Don't use fgets to read, but rather fgetc (to be portable and still support common wrong line ends for the platform).

Regards,
Dieter
Anonymous
 

Re: PGN-reading code

Postby Uri Blass » 02 Dec 2004, 22:34

More general code is already done in Crafty and I use it.
Bob told me that I can use code to translate pgn from crafty.

I will post some source code later but I agree with Dieter that it is better to have a general code to check for a move and translating move to SAN may be done in more than one way.

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: PGN-reading code

Postby Sune Fischer » 02 Dec 2004, 22:41

Dieter B?r?ner wrote:Mainly to Sune's idea: I think, comparing move strings is not a good idea. It is too fragile and will fail with man hand written PGNs as well as with machine written ones (in rare cases). My suggestion would be to write a more general move parser, that will read about any thinkable move format (including long algebraic notation and coordinate notation).

When a word is not a legal move, just skip the word, and take the next. This will skip things like +-, $9 and included comments without {}.

This is obviously just for the move part of the PGN, with comments already considered (and in a first iteration variation perhaps treated as comments).


Hi Dieter,

I filter out all the nags !?+ also, I don't consider those essential,
and castle moves with zeros instead of big-O's will also be understood.
Other little innocient things like that will be added as I run into them.

My primary concern however is to not get corrupt lines in the book.
Suppose in a line you get a few moves you can't read, then you skip them and read the next moves. These moves might be legal but it's going to be a different game.
Therefore I think one should not try and parse everything at all costs, it's better to disregard 1% of the games that may be unreadable.

I even drop the whole game if there is an error at move 20. Maybe the first 19 moves are fine, but perhaps the error is at move 15 and it is not cought until move 20. I conclude the game cannot be trusted at all.

I have yet to try some of the large public databases, perhaps a more flexible SAN reader is needed if these are full of bugs, but for now it seems safer (and easer) to do things backwards. ;)

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN-reading code

Postby Uri Blass » 02 Dec 2004, 23:30

Sune Fischer wrote:
Dieter B?r?ner wrote:Mainly to Sune's idea: I think, comparing move strings is not a good idea. It is too fragile and will fail with man hand written PGNs as well as with machine written ones (in rare cases). My suggestion would be to write a more general move parser, that will read about any thinkable move format (including long algebraic notation and coordinate notation).

When a word is not a legal move, just skip the word, and take the next. This will skip things like +-, $9 and included comments without {}.

This is obviously just for the move part of the PGN, with comments already considered (and in a first iteration variation perhaps treated as comments).


Hi Dieter,

I filter out all the nags !?+ also, I don't consider those essential,
and castle moves with zeros instead of big-O's will also be understood.
Other little innocient things like that will be added as I run into them.

My primary concern however is to not get corrupt lines in the book.
Suppose in a line you get a few moves you can't read, then you skip them and read the next moves. These moves might be legal but it's going to be a different game.
Therefore I think one should not try and parse everything at all costs, it's better to disregard 1% of the games that may be unreadable.

I even drop the whole game if there is an error at move 20. Maybe the first 19 moves are fine, but perhaps the error is at move 15 and it is not cought until move 20. I conclude the game cannot be trusted at all.

I have yet to try some of the large public databases, perhaps a more flexible SAN reader is needed if these are full of bugs, but for now it seems safer (and easer) to do things backwards. ;)

-S.


Hi Sune,
It seems that you did not read Crafty's code
In case of illegal move Crafty continue to read the moves until getting a new game and simply ignore them.

The point is that for every game you can try to parse everything inside a game before the first illegal move and having more general code to read moves is better.

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: PGN-reading code

Postby Sune Fischer » 02 Dec 2004, 23:50

Hi Uri,

Hi Sune,
It seems that you did not read Crafty's code
In case of illegal move Crafty continue to read the moves until getting a new game and simply ignore them.


I looked but decided I could make a simpler version :)

The point is that for every game you can try to parse everything inside a game before the first illegal move and having more general code to read moves is better.


Ok, I just tried Bob's big gm2600.pgn file at
http://custom.lab.unb.br/pub/chess/crafty/common/

27202 games parsed without an error using the string compare method!

If there is a problem with this method I don't see it.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN-reading code

Postby Alessandro Scotti » 03 Dec 2004, 00:09

Sune Fischer wrote:Yes that's the way to eat comments I think, to loop until the number of end-braces equals the number of start-braces, also to remove nested comments.


Hi Sune,
I think nested comments are illegal in PGN, and nested opening braces should be ignored (at least, that's what my copy of the standard says).
User avatar
Alessandro Scotti
 
Posts: 306
Joined: 20 Nov 2004, 00:10
Location: Rome, Italy

Re: PGN-reading code

Postby Anonymous » 03 Dec 2004, 21:40

Sune, I agree, that for book creation code, it would be better to fail (for the current game) after the first wrong move. Actually my engine will also do it. My suggestion was to get out possibly most in other cases.

For the string compare thing: See for example http://chessprogramming.org/cccsearch/c ... _id=273910

Shredder Classic formatted 6. Rfxf3 wrong (IMO not a serious error). The source colomn is wrong, because the other rook is pinned. IIRC, I have seen the same or a similar error under the CB GUI. It is totally clear, what is meant by the move, but the string compare will fail.

When you write a more general move parser (it is not too much work), you have one additional advantage - operation in console mode is nicer.

Regards,
Dieter
Anonymous
 

Re: PGN-reading code

Postby Sune Fischer » 03 Dec 2004, 22:28

Hi Dieter,

Your right that those over-specified moves do fail.
I found some pgns that used this faulty notation, it appears to err in about 0.5-1.5% of the games when this notation is used.
Quite frankly that seems rare enough to ignore, I guess it just doens't happen that often on the board.

The user still has the option of parsing it though a more advanced pgn fixing tool if a 1.5% loss is unacceptable.
In any case that might be a good thing to do before you feed it to the "amateur" engines ;)

I think a much more serious problem is the often incorrect usage of braces, I'm having some problems with those.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

PGN parsing

Postby Dann Corbit » 03 Dec 2004, 22:38

Look at Winboard's source code by Mann and also at PGN-Extract by Barnes if you want to see the right way to do it.

Anything else is inferior.

The stuff in Winboard is the most clever. Unfortunately, Tim Mann does not publish his grammar used to create the parser and also Mr. Barnes does not publish his either (which is too bad).

Tim Mann's stuff is very clever in things like 0-0-0 instead of O-O-O and other solvable, commong gaffes. It is ultra-intelligent about recognizing all sorts of move formats.
Dann Corbit
 

Re: PGN-reading code

Postby Sune Fischer » 03 Dec 2004, 22:45

I disagree.

I think for everyone to try and implement the ultimate pgn parser is overkill.
There are numerous tools out there to fix anything that is broken.

You can feed it though and it will print the number of dumped games, if this number is too high you are free to clean up the pgn.

I think that is good enough, I really don't want to struggle with all the possible ways things can be done incorrectly.

That's a general principle of mine, protocols should be followed strictly and anyone how can't or won't probably isn't producing valuable data anyway.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN-reading code

Postby Uri Blass » 04 Dec 2004, 01:16

I post the code that I have now for reading pgn file(You can see that a lot of it is copied from Crafty but Bob said that it is ok).

Some parts that Crafty has and are irrelevant are not used afer I deleted them.

I still think how to make things more simple and if you can help me it can be productive(note that code that I post is based on code that I use to read moves in san notation when number can be also inside the move so the code is able to read 12.e4 or 12.e2e4)

I can calculate statistics about pgn in the same way that I can calculate epd file at this moment when I call readpgn with flag=-2 but I do not like the solution when readpgn gets different number for flag for every statistics and I also do not like to write big function for every statistics.

I want to have simple functions to calculate statistics about positions of pgn file but the problem is that I need for it a function to get the next position and movei of today get the next position not based on a function.

Here is the code that I use at this moment.
Note that you do not need to fully understand read_next_command in order to help me but only understand that it returns -1 in the end of the file and 1 if header is read and 0 if move is read.

Note also the function ReadNextMove makes the move and returns 1 if you read a legal move and 0 if you do not read a legal move.

The main problem that I still did not solve is to finish designing
the function void calc_statistics_pgn()

I explain inside the code how it is supposed to work but a basic function to calculate the next move that also need to care for the case when there is no legal move(happen at the end of every game) is still not designed.

Code: Select all
#include "defs.h"
#include "data.h"
#include "protos.h"


FILE *games;

int openpgn(char * gamesname)
{
   games=fopen(gamesname ,"r+b");
   if (games)
   {
      fseek(games,0,SEEK_SET);
      return 1;
   }
   return 0;
}
char      pgn_event[32] = {"?"};
char      pgn_site[32] = {"?"};
char      pgn_round[32] = {"?"};
char      pgn_date[32] = {"????.??.??"};
char      pgn_white_elo[32] = {""};
char      pgn_white[64] = {"unknown"};
char      pgn_black_elo[32] = {""};
char      pgn_black[64] = {"Movei " VERSION};
char      pgn_result[32] = {"*"};
int read_next_command(FILE* games,int option)
{
   /*read_next_command return -1 if end of file and 0 if move is read and 1 if
header is read
   it also fill bookbuf with the  information in the file as a string
   Later ReadNextMove with bookbuf can be done in case that it is a move*/
   static int data=0,lines_read=0;
   static char input_buffer[512];
   char temp[512],*eof,analysis_move[64];
   int braces=0,parens=0,brackets=0,analysis=0,last_good_line,converted;
   if (!games)
   {
      lines_read=0;
      data=0;
      return 0;
   }
   if (option==-1) data=0;
   if (option==-2) return(lines_read);
   while (1)
   {
      if   (!data)
      {
         eof=fgets (input_buffer,512,games);
         if (!eof) return(-1);
         if (strchr(input_buffer,'\n')) *strchr(input_buffer,'\n')=0;
         if (strchr(input_buffer,'\r')) *strchr(input_buffer,'\r')=' ';
         lines_read++;
         bookbuf[0]=0;
         converted = sscanf(input_buffer,"%s",bookbuf);
         if (bookbuf[0] == '[') do
         {
             char *bracket1, *bracket2, value[128];
             strncpy(bookbuf,input_buffer, sizeof bookbuf);
             bracket1=strchr(input_buffer,'\"');
             if (bracket1 == 0) return(1);
             bracket2=strchr(bracket1+1,'\"');
             if (bracket2 == 0) return(1);
             *bracket1=0;
             *bracket2=0;
             strncpy(value,bracket1+1, sizeof value);
             if (strstr(input_buffer,"Event")) strncpy(pgn_event,value, sizeof pgn_event);
             else if (strstr(input_buffer,"Site")) strncpy(pgn_site,value, sizeof pgn_site);
             else if (strstr(input_buffer,"Round")) strncpy(pgn_round,value, sizeof pgn_round);
             else if (strstr(input_buffer,"Date")) strncpy(pgn_date,value, sizeof pgn_date);
             else if (strstr(input_buffer,"WhiteElo")) strncpy(pgn_white_elo,value, sizeof pgn_white_elo);
             else if (strstr(input_buffer,"White")) strncpy(pgn_white,value, sizeof pgn_white);
             else if (strstr(input_buffer,"BlackElo")) strncpy(pgn_black_elo,value, sizeof pgn_black_elo);
             else if (strstr(input_buffer,"Black")) strncpy(pgn_black,value, sizeof pgn_black);
             else if (strstr(input_buffer,"Result")) strncpy(pgn_result,value, sizeof pgn_result);
             else if (strstr(input_buffer,"FEN"))
             {
                sprintf(bookbuf,"setboard %s",value);
                setup(value);
             }
             return 1;
          }
          while(0);
          data=1;
      }
      /*
----------------------------------------------------------
|                                                          |
|  if we already have data in the bookbuf, it is just a    |
|  matter of extracting the next move and returning it to  |
|  the caller.  if bookbuf is empty, another line has      |
|  to be read in.                                          |
|                                                          |
----------------------------------------------------------
      */
      else
      {
         bookbuf[0]=0;
         sscanf(input_buffer,"%256s",bookbuf);
         if (strlen(bookbuf) == 0)
         {
            data=0;
            continue;
         }
         else
         {
            char *skip;
            strcpy(temp,input_buffer);
            skip=strstr(input_buffer,bookbuf);
            if (skip) strncpy(input_buffer,skip+strlen(bookbuf), sizeof input_buffer);
         }

         /*
   ----------------------------------------------------------
  |                                                          |
  |  this skips over nested { or ( characters and finds the  |
  |  'mate', before returning any more moves.  it also stops |
  |  if a PGN header is encountered, probably due to an      |
  |  incorrectly bracketed analysis variation.               |
  |                                                          |
   ----------------------------------------------------------
         */
         last_good_line=lines_read;
         analysis_move[0]=0;
         if (strchr(bookbuf,'{') || strchr(bookbuf,'('))
            while (1)
            {
               char *skip, *ch;
               analysis=1;
               while ((ch=strpbrk(bookbuf,"(){}[]")))
               {
                  if (*ch == '(')
                  {
                     *strchr(bookbuf,'(')=' ';
                     if (!braces) parens++;
                  }
                  if (*ch == ')')
                  {
                     *strchr(bookbuf,')')=' ';
                     if (!braces) parens--;
                  }
                  if (*ch == '{')
                  {
                     *strchr(bookbuf,'{')=' ';
                     braces++;
                  }
                  if (*ch == '}')
                  {
                     *strchr(bookbuf,'}')=' ';
                     braces--;
                  }
                  if (*ch == '[')
                  {
                     *strchr(bookbuf,'[')=' ';
                     if (!braces) brackets++;
                  }
                  if (*ch == ']')
                  {
                     *strchr(bookbuf,']')=' ';
                     if (!braces) brackets--;
                  }
               }
               if (analysis && analysis_move[0]==0)
               {
                  if (strspn(bookbuf," ") != strlen(bookbuf))
                  {
                     char *tmove=analysis_move;
                     sscanf(bookbuf,"%64s",analysis_move);
                     strcpy(bookbuf,analysis_move);
                     if (strcmp(bookbuf,"0-0") && strcmp(bookbuf,"0-0-0"))
                        tmove=bookbuf+strspn(bookbuf,"0123456789.");
                     else
                        tmove=bookbuf;
                     if ((tmove[0]>='a' && tmove[0]<='z') ||(tmove[0]>='A' && tmove[0]<='Z') ||
                        !strcmp(tmove,"0-0") || !strcmp(tmove,"0-0-0"))
                        strcpy(analysis_move,bookbuf);
                     else
                        analysis_move[0]=0;
                  }
               }
               if (parens==0 && braces==0 && brackets==0) break;
               bookbuf[0]=0;
               sscanf(input_buffer,"%s",bookbuf);
               if (strlen(bookbuf) == 0)
               {
                  eof=fgets(input_buffer,512,games);
                  if (!eof)
                  {
                     parens=0;
                     braces=0;
                     brackets=0;
                     return(-1);
                  }
                  if (strchr(input_buffer,'\n')) *strchr(input_buffer,'\n')=0;
                  if (strchr(input_buffer,'\r')) *strchr(input_buffer,'\r')=' ';
                  lines_read++;
                  if (lines_read-last_good_line >= 100)
                  {
                     parens=0;
                     braces=0;
                     brackets=0;
                     Print("ERROR.  comment spans over 100 lines, starting at line %d\n",last_good_line);
                     break;
                  }
               }
               strcpy(temp,input_buffer);
               skip=strstr(input_buffer,bookbuf)+strlen(bookbuf);
               strcpy(input_buffer,skip);
            }
            else
            {
               int skip;
               if ((skip=strspn(bookbuf,"0123456789.")))
               {
                  char temp[512];
                  strcpy(temp,bookbuf+skip);
                  strcpy(bookbuf,temp);
               }
               if (isalpha(bookbuf[0]) || strchr(bookbuf,'-'))
                  return(0);
            }
         }
   }
}
int result;
int games_parsed;
BitBoard sumperftpgn;
void parseheader()
{
if (strstr(bookbuf,"Site"))
            {
               games_parsed++;
               result=3;
            }
            else if (strstr(bookbuf,"esult"))
            {
               if (strstr(bookbuf,"1-0")) result=2;
               else if (strstr(bookbuf,"0-1")) result=1;
               else if (strstr(bookbuf,"1/2-1/2")) result=0;
               else if (strstr(bookbuf,"*")) result=3;
            }
}

int get_next_position_in_pgn()
{
   /*this function read next command from file games until it see a new move that mean new interesting position
   because move was done from it.
   it returns 1 if t finds new move and 0 in case that it does not find new move
   it does not check if the move is illegal or legal at this moment and it assumes getting only legal moves
   but the move can be illegal because of getting 1-0 or 1/2-1/2 that are considered to be moves before checking
   so I need to think how to change it
   */
   int data_read=read_next_command(games,0);
   while (data_read==1)
   {
      parseheader();
      data_read=read_next_command(games,0);

   }
   if (data_read==-1)
      return 0;
   /*now  data_read=0
   next command is a move so the position you have is relevant and you
return 1 and next steps is to read the fen and make the move*/
      return 1;
}
int handle_possible_error(void)
{
   int data_read=0;
   if (strspn(bookbuf,"0123456789/-.*")!=strlen(bookbuf)&&(hply<max_plies_of_game))
   {
      printf("%s-",pgn_white);
      printf(" %s ",pgn_black);
      printf("ply err%d ",hply);
      printf(" move %s is illegal line %d ",bookbuf,read_next_command(games,-2));
      printf("\n");
      read_next_command(games,-1);
      do
         data_read=read_next_command(games,0);
      while (data_read==0);
   }
   return data_read;
}
int calc_move()
{
   char *ch;
   int move;
   /*not finished*/
   do
   {
      if ((ch = strpbrk(bookbuf, "?!")))
              *ch = 0;
      if (!strchr(bookbuf,'$')&&!strchr(bookbuf,'*'))
      {
         if (hply<max_plies_of_game)
            move=ReadNextMove(bookbuf);
         else
            move=0;
         return move;
      }
      else
         read_next_command(games,0);
   }
   while (1);
      return 0;


}
void calc_statistics_pgn()
{
   while (get_next_position_in_pgn())
   {
      /*do the following steps
      1)calcinfo(it can be calc perft(n))
      2)calculatemove(need to clear ? ! and to decide when there is no legal move
      3)if you have illegal move start new game else make the move
      */

   }
}
int readnextgame(int flag)
{
   int data_read=0;
   int move;
   char *ch;
   char *candidatefen;

   while (data_read==0)
   {
       if (flag>=1)
         sumperftpgn+=perft(flag-1);
      if (flag==-2)
         candidatefen=translate_pos_to_fen();

      /*need to add reading ? and !  but it is not important now*/

      if ((ch = strpbrk(bookbuf, "?!")))
              *ch = 0;

      if (!strchr(bookbuf,'$')&&!strchr(bookbuf,'*'))
      {
         if (hply<max_plies_of_game)
            move=ReadNextMove(bookbuf);
         else
            move=0;
         if (move==0)
         {   
            data_read=handle_possible_error();
            if (data_read!=0)
               break;
         }
         else
            if (flag==-2)
            fprintf(pgntoepd,"%s\n",candidatefen);

      }
      data_read=read_next_command(games,0);
   }
   if (flag>=1)
         sumperftpgn-=perft(flag-1);
   setup("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1");
   return data_read;
}

void readpgn(char* gamesname,int flag)
{
   int data_read;
   games_parsed=0;
   if (flag==-2)
   {
      pgntoepd = fopen("pgntoepd.epd","w+");
   if (pgntoepd==NULL) return;
   }

   setup("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1");
   read_next_command(0,0);
   /*first read_next_command only does some initialization stuff because 0 is not pointer to file*/
   if  (openpgn(gamesname))
   {
      do
      {
         data_read=read_next_command(games,0);
      }
      while (data_read==0);
      /*reading moves before first game was done and moves with no headers before them are not relevant*/

      do
      {
         if (data_read<0)
            break;
         if (data_read==1)
         {
            parseheader();
            data_read=read_next_command(games,0);
         }
         else
            data_read=readnextgame(flag);
      }
      while (strcmp(bookbuf,"end")&&data_read!=-1);
      fclose(games);
   }
   printf("games= %d",games_parsed);
   if (flag>=1)
      printf("perft=%I64u \n",sumperftpgn);
   if (flag==-2)
      fclose(pgntoepd);
}

void translatepgn_to_epd(char * gamesname)
{
    readpgn(gamesname,-2);
}
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Re: PGN-reading code

Postby Sune Fischer » 04 Dec 2004, 14:19

Uri, one of the reasons I rolled my own was so I wouldn't have to deal with such a long smear from someone else. ;)

What kind of statistics is it you want?
There must be a makemove() somewhere, why not check for the relevant statistics after that?

Perhaps just give the pgn parser a single flag, e.g. bool do_stats, and then call your huge statistical analyzer function, calc_statistics_pgn(), after makemove if this flag says so.

I imagine you will also need init_stats() and end_stats() functions.

I guess it can be done in many ways, but it might depend on what kind of statistics you are interested in.

-S.
User avatar
Sune Fischer
 
Posts: 126
Joined: 07 Oct 2004, 11:12
Location: Denmark

Re: PGN-reading code

Postby Uri Blass » 04 Dec 2004, 16:21

Yes there is make move:

I wrote:

"Note also the function ReadNextMove makes the move and returns 1 if you read a legal move and 0 if you do not read a legal move."

ReadNextMove is in another file that I did not post.

There was a reason for it but I see now that the reason was wrong.

My move generator does not detect if a move is a check or not a check so I need to make the move to know if it is a check.

I did not want to waste time by undoing the move so I thought that my ReadNextMove will make the move in case that it is legal.

The problem is that I did not save undoing the move and only pushed it to ReadNextMove.

The problem is that read next move does a loop on all the move and make and unmake every move that fit the descreption.

The reason that I decided to make and unmake moves is that even after knowing that a move is legal(I have legal move generator) it is not clear if the move is check or not check and there still may be an error.

It is possible that one Rf2 is Rf2 and one Rf2 is Rf2+ (indirect check) .

When I look at my code of ReadNextMove(a lot of it is also copied from Crafty) I see that one of the conditions to make move is
if (givecheck==0||kingincheck>0)

In other words the case when both happen is also considered as legal
and Rf2 is consider as legal move even if the right move is Rf2+ but Rf2+ is considered as illegal when the right move is Rf2

In other words in the following diagram

[diag]8/8/k7/8/8/8/4R1R1/5BK1 w - - 0 1[/diag]

Rf2 is considered as ambigous and error but Rf2+ is clear.

I am not sure if it is right to do it and my common sense says that Rf2 is clear because no need to write Rgf2 when only one rook can do it without check.

Uri
User avatar
Uri Blass
 
Posts: 727
Joined: 09 Oct 2004, 05:59
Location: Tel-Aviv

Next

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 12 guests