Moderator: Andres Valverde
Laurens Winkelhagen wrote:Hi all,
I like programming my chess engine, trying to make it better and such, but I dislike some aspects of it. Probably because I'm not well versed in the routines involved... it's (file) parsing.
More specifically, I would like to parse a PGN-file.I already made a function to parse SAN-notation, but I'm stuck with the PGN-file itself. Problems seem to arise when I think about moves going from one line onto the other, hacked in half by a new line. (does that occur?)
So my question is: does anyone have some easy to understand code to read PGN-files? I tried looking at crafty, but I'm really looking for something more 'pedagogical'.
Thanx, Laurens.
Laurens Winkelhagen wrote:Hi all,
I like programming my chess engine, trying to make it better and such, but I dislike some aspects of it. Probably because I'm not well versed in the routines involved... it's (file) parsing.
More specifically, I would like to parse a PGN-file.I already made a function to parse SAN-notation, but I'm stuck with the PGN-file itself. Problems seem to arise when I think about moves going from one line onto the other, hacked in half by a new line. (does that occur?)
So my question is: does anyone have some easy to understand code to read PGN-files? I tried looking at crafty, but I'm really looking for something more 'pedagogical'.
Thanx, Laurens.
Sune Fischer wrote:I think the real challenge is how to recover from parse errors, currently it cannot recover from all types, for instance a missing closing brace will cause it to see the remaining file as one big comment.
There is still some robustness to fiddle with to make it handle these things better, but right now it seems to eat about 99.9% of all games if they are well behaved.
-S.
error parsing game 81 0000 h4xf6 2r5/2P1n1kp/3R1pp1/4p3/R6P/6P1/5PK1/1r6 w - -
{Illegal move Black wins} 0-1
error parsing game 254 0000 c4c2 8/p6P/1p3k2/8/2P1p3/PP5Q/1K4r1/6r1 w - -
{Black wins because a illegal move of white} 0-1
error parsing game 2372 0000 Ngf3 r1bqk2r/ppp2ppp/2n1pn2/3p4/1bPP4/4P3/PPQN1PPP/R1B1KBNR w KQkq -
error parsing game 13294 0000 O-O 2r1k2r/p2qn1pp/1pnBp3/4P3/P3N3/3Q4/5PPP/R1R2K2 b - -
{Black wins on time, illegal move (O-O) by Francesca!} 1-0
error parsing game 16773 0000 Kd7 8/8/4k3/3p4/K5Q1/2P5/1p1n4/8 b - -
{Black wins on time, illegal move by black} 1-0
error parsing game 17410 0000 Kg8xf7 6k1/5pp1/7p/1R1p4/2pPR1P1/8/1P3PPK/r7 b - -
{Black wins on time, illegal move by Chispa} 1-0
error parsing game 18145 0000 Rh8 3R4/8/8/8/8/2r4k/1p3K2/7n w - -
{White wins on time, illegal move by Tao (dont know about minor promotion!)} 0-1
error parsing game 23817 0000 Ka5a6 5b2/8/8/kpP1p1p1/4Pp1p/K2N1P1P/6P1/8 w - -
{Black wins on time, illegal move by white!} 0-1
error parsing game 24153 0000 a8a8 8/8/2k2r2/4Q3/P1p4p/K1P4P/5r2/8 b - -
{Draw by ...(3), a8a8?? set to 1-0} 1-0
Dieter B?r?ner wrote:Mainly to Sune's idea: I think, comparing move strings is not a good idea. It is too fragile and will fail with man hand written PGNs as well as with machine written ones (in rare cases). My suggestion would be to write a more general move parser, that will read about any thinkable move format (including long algebraic notation and coordinate notation).
When a word is not a legal move, just skip the word, and take the next. This will skip things like +-, $9 and included comments without {}.
This is obviously just for the move part of the PGN, with comments already considered (and in a first iteration variation perhaps treated as comments).
Sune Fischer wrote:Dieter B?r?ner wrote:Mainly to Sune's idea: I think, comparing move strings is not a good idea. It is too fragile and will fail with man hand written PGNs as well as with machine written ones (in rare cases). My suggestion would be to write a more general move parser, that will read about any thinkable move format (including long algebraic notation and coordinate notation).
When a word is not a legal move, just skip the word, and take the next. This will skip things like +-, $9 and included comments without {}.
This is obviously just for the move part of the PGN, with comments already considered (and in a first iteration variation perhaps treated as comments).
Hi Dieter,
I filter out all the nags !?+ also, I don't consider those essential,
and castle moves with zeros instead of big-O's will also be understood.
Other little innocient things like that will be added as I run into them.
My primary concern however is to not get corrupt lines in the book.
Suppose in a line you get a few moves you can't read, then you skip them and read the next moves. These moves might be legal but it's going to be a different game.
Therefore I think one should not try and parse everything at all costs, it's better to disregard 1% of the games that may be unreadable.
I even drop the whole game if there is an error at move 20. Maybe the first 19 moves are fine, but perhaps the error is at move 15 and it is not cought until move 20. I conclude the game cannot be trusted at all.
I have yet to try some of the large public databases, perhaps a more flexible SAN reader is needed if these are full of bugs, but for now it seems safer (and easer) to do things backwards.
-S.
Hi Sune,
It seems that you did not read Crafty's code
In case of illegal move Crafty continue to read the moves until getting a new game and simply ignore them.
The point is that for every game you can try to parse everything inside a game before the first illegal move and having more general code to read moves is better.
Sune Fischer wrote:Yes that's the way to eat comments I think, to loop until the number of end-braces equals the number of start-braces, also to remove nested comments.
#include "defs.h"
#include "data.h"
#include "protos.h"
FILE *games;
int openpgn(char * gamesname)
{
games=fopen(gamesname ,"r+b");
if (games)
{
fseek(games,0,SEEK_SET);
return 1;
}
return 0;
}
char pgn_event[32] = {"?"};
char pgn_site[32] = {"?"};
char pgn_round[32] = {"?"};
char pgn_date[32] = {"????.??.??"};
char pgn_white_elo[32] = {""};
char pgn_white[64] = {"unknown"};
char pgn_black_elo[32] = {""};
char pgn_black[64] = {"Movei " VERSION};
char pgn_result[32] = {"*"};
int read_next_command(FILE* games,int option)
{
/*read_next_command return -1 if end of file and 0 if move is read and 1 if
header is read
it also fill bookbuf with the information in the file as a string
Later ReadNextMove with bookbuf can be done in case that it is a move*/
static int data=0,lines_read=0;
static char input_buffer[512];
char temp[512],*eof,analysis_move[64];
int braces=0,parens=0,brackets=0,analysis=0,last_good_line,converted;
if (!games)
{
lines_read=0;
data=0;
return 0;
}
if (option==-1) data=0;
if (option==-2) return(lines_read);
while (1)
{
if (!data)
{
eof=fgets (input_buffer,512,games);
if (!eof) return(-1);
if (strchr(input_buffer,'\n')) *strchr(input_buffer,'\n')=0;
if (strchr(input_buffer,'\r')) *strchr(input_buffer,'\r')=' ';
lines_read++;
bookbuf[0]=0;
converted = sscanf(input_buffer,"%s",bookbuf);
if (bookbuf[0] == '[') do
{
char *bracket1, *bracket2, value[128];
strncpy(bookbuf,input_buffer, sizeof bookbuf);
bracket1=strchr(input_buffer,'\"');
if (bracket1 == 0) return(1);
bracket2=strchr(bracket1+1,'\"');
if (bracket2 == 0) return(1);
*bracket1=0;
*bracket2=0;
strncpy(value,bracket1+1, sizeof value);
if (strstr(input_buffer,"Event")) strncpy(pgn_event,value, sizeof pgn_event);
else if (strstr(input_buffer,"Site")) strncpy(pgn_site,value, sizeof pgn_site);
else if (strstr(input_buffer,"Round")) strncpy(pgn_round,value, sizeof pgn_round);
else if (strstr(input_buffer,"Date")) strncpy(pgn_date,value, sizeof pgn_date);
else if (strstr(input_buffer,"WhiteElo")) strncpy(pgn_white_elo,value, sizeof pgn_white_elo);
else if (strstr(input_buffer,"White")) strncpy(pgn_white,value, sizeof pgn_white);
else if (strstr(input_buffer,"BlackElo")) strncpy(pgn_black_elo,value, sizeof pgn_black_elo);
else if (strstr(input_buffer,"Black")) strncpy(pgn_black,value, sizeof pgn_black);
else if (strstr(input_buffer,"Result")) strncpy(pgn_result,value, sizeof pgn_result);
else if (strstr(input_buffer,"FEN"))
{
sprintf(bookbuf,"setboard %s",value);
setup(value);
}
return 1;
}
while(0);
data=1;
}
/*
----------------------------------------------------------
| |
| if we already have data in the bookbuf, it is just a |
| matter of extracting the next move and returning it to |
| the caller. if bookbuf is empty, another line has |
| to be read in. |
| |
----------------------------------------------------------
*/
else
{
bookbuf[0]=0;
sscanf(input_buffer,"%256s",bookbuf);
if (strlen(bookbuf) == 0)
{
data=0;
continue;
}
else
{
char *skip;
strcpy(temp,input_buffer);
skip=strstr(input_buffer,bookbuf);
if (skip) strncpy(input_buffer,skip+strlen(bookbuf), sizeof input_buffer);
}
/*
----------------------------------------------------------
| |
| this skips over nested { or ( characters and finds the |
| 'mate', before returning any more moves. it also stops |
| if a PGN header is encountered, probably due to an |
| incorrectly bracketed analysis variation. |
| |
----------------------------------------------------------
*/
last_good_line=lines_read;
analysis_move[0]=0;
if (strchr(bookbuf,'{') || strchr(bookbuf,'('))
while (1)
{
char *skip, *ch;
analysis=1;
while ((ch=strpbrk(bookbuf,"(){}[]")))
{
if (*ch == '(')
{
*strchr(bookbuf,'(')=' ';
if (!braces) parens++;
}
if (*ch == ')')
{
*strchr(bookbuf,')')=' ';
if (!braces) parens--;
}
if (*ch == '{')
{
*strchr(bookbuf,'{')=' ';
braces++;
}
if (*ch == '}')
{
*strchr(bookbuf,'}')=' ';
braces--;
}
if (*ch == '[')
{
*strchr(bookbuf,'[')=' ';
if (!braces) brackets++;
}
if (*ch == ']')
{
*strchr(bookbuf,']')=' ';
if (!braces) brackets--;
}
}
if (analysis && analysis_move[0]==0)
{
if (strspn(bookbuf," ") != strlen(bookbuf))
{
char *tmove=analysis_move;
sscanf(bookbuf,"%64s",analysis_move);
strcpy(bookbuf,analysis_move);
if (strcmp(bookbuf,"0-0") && strcmp(bookbuf,"0-0-0"))
tmove=bookbuf+strspn(bookbuf,"0123456789.");
else
tmove=bookbuf;
if ((tmove[0]>='a' && tmove[0]<='z') ||(tmove[0]>='A' && tmove[0]<='Z') ||
!strcmp(tmove,"0-0") || !strcmp(tmove,"0-0-0"))
strcpy(analysis_move,bookbuf);
else
analysis_move[0]=0;
}
}
if (parens==0 && braces==0 && brackets==0) break;
bookbuf[0]=0;
sscanf(input_buffer,"%s",bookbuf);
if (strlen(bookbuf) == 0)
{
eof=fgets(input_buffer,512,games);
if (!eof)
{
parens=0;
braces=0;
brackets=0;
return(-1);
}
if (strchr(input_buffer,'\n')) *strchr(input_buffer,'\n')=0;
if (strchr(input_buffer,'\r')) *strchr(input_buffer,'\r')=' ';
lines_read++;
if (lines_read-last_good_line >= 100)
{
parens=0;
braces=0;
brackets=0;
Print("ERROR. comment spans over 100 lines, starting at line %d\n",last_good_line);
break;
}
}
strcpy(temp,input_buffer);
skip=strstr(input_buffer,bookbuf)+strlen(bookbuf);
strcpy(input_buffer,skip);
}
else
{
int skip;
if ((skip=strspn(bookbuf,"0123456789.")))
{
char temp[512];
strcpy(temp,bookbuf+skip);
strcpy(bookbuf,temp);
}
if (isalpha(bookbuf[0]) || strchr(bookbuf,'-'))
return(0);
}
}
}
}
int result;
int games_parsed;
BitBoard sumperftpgn;
void parseheader()
{
if (strstr(bookbuf,"Site"))
{
games_parsed++;
result=3;
}
else if (strstr(bookbuf,"esult"))
{
if (strstr(bookbuf,"1-0")) result=2;
else if (strstr(bookbuf,"0-1")) result=1;
else if (strstr(bookbuf,"1/2-1/2")) result=0;
else if (strstr(bookbuf,"*")) result=3;
}
}
int get_next_position_in_pgn()
{
/*this function read next command from file games until it see a new move that mean new interesting position
because move was done from it.
it returns 1 if t finds new move and 0 in case that it does not find new move
it does not check if the move is illegal or legal at this moment and it assumes getting only legal moves
but the move can be illegal because of getting 1-0 or 1/2-1/2 that are considered to be moves before checking
so I need to think how to change it
*/
int data_read=read_next_command(games,0);
while (data_read==1)
{
parseheader();
data_read=read_next_command(games,0);
}
if (data_read==-1)
return 0;
/*now data_read=0
next command is a move so the position you have is relevant and you
return 1 and next steps is to read the fen and make the move*/
return 1;
}
int handle_possible_error(void)
{
int data_read=0;
if (strspn(bookbuf,"0123456789/-.*")!=strlen(bookbuf)&&(hply<max_plies_of_game))
{
printf("%s-",pgn_white);
printf(" %s ",pgn_black);
printf("ply err%d ",hply);
printf(" move %s is illegal line %d ",bookbuf,read_next_command(games,-2));
printf("\n");
read_next_command(games,-1);
do
data_read=read_next_command(games,0);
while (data_read==0);
}
return data_read;
}
int calc_move()
{
char *ch;
int move;
/*not finished*/
do
{
if ((ch = strpbrk(bookbuf, "?!")))
*ch = 0;
if (!strchr(bookbuf,'$')&&!strchr(bookbuf,'*'))
{
if (hply<max_plies_of_game)
move=ReadNextMove(bookbuf);
else
move=0;
return move;
}
else
read_next_command(games,0);
}
while (1);
return 0;
}
void calc_statistics_pgn()
{
while (get_next_position_in_pgn())
{
/*do the following steps
1)calcinfo(it can be calc perft(n))
2)calculatemove(need to clear ? ! and to decide when there is no legal move
3)if you have illegal move start new game else make the move
*/
}
}
int readnextgame(int flag)
{
int data_read=0;
int move;
char *ch;
char *candidatefen;
while (data_read==0)
{
if (flag>=1)
sumperftpgn+=perft(flag-1);
if (flag==-2)
candidatefen=translate_pos_to_fen();
/*need to add reading ? and ! but it is not important now*/
if ((ch = strpbrk(bookbuf, "?!")))
*ch = 0;
if (!strchr(bookbuf,'$')&&!strchr(bookbuf,'*'))
{
if (hply<max_plies_of_game)
move=ReadNextMove(bookbuf);
else
move=0;
if (move==0)
{
data_read=handle_possible_error();
if (data_read!=0)
break;
}
else
if (flag==-2)
fprintf(pgntoepd,"%s\n",candidatefen);
}
data_read=read_next_command(games,0);
}
if (flag>=1)
sumperftpgn-=perft(flag-1);
setup("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1");
return data_read;
}
void readpgn(char* gamesname,int flag)
{
int data_read;
games_parsed=0;
if (flag==-2)
{
pgntoepd = fopen("pgntoepd.epd","w+");
if (pgntoepd==NULL) return;
}
setup("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1");
read_next_command(0,0);
/*first read_next_command only does some initialization stuff because 0 is not pointer to file*/
if (openpgn(gamesname))
{
do
{
data_read=read_next_command(games,0);
}
while (data_read==0);
/*reading moves before first game was done and moves with no headers before them are not relevant*/
do
{
if (data_read<0)
break;
if (data_read==1)
{
parseheader();
data_read=read_next_command(games,0);
}
else
data_read=readnextgame(flag);
}
while (strcmp(bookbuf,"end")&&data_read!=-1);
fclose(games);
}
printf("games= %d",games_parsed);
if (flag>=1)
printf("perft=%I64u \n",sumperftpgn);
if (flag==-2)
fclose(pgntoepd);
}
void translatepgn_to_epd(char * gamesname)
{
readpgn(gamesname,-2);
}
Return to Programming and Technical Discussions
Users browsing this forum: No registered users and 15 guests