Dump utility for polyglot books

Discussions about Winboard/Xboard. News about engines or programs to use with these GUIs (e.g. tournament managers or adapters) belong in this sub forum.

Moderator: Andres Valverde

Dump utility for polyglot books

Postby Michel » 29 Nov 2008, 01:25

I wrote an extension to polyglot to dump polyglot books in human readable format.
I need to do a little bit of code cleanup but it is late. In any case here you can see
the output for gm2600.bin, one of the books that comes with Scid.

Lines for white:

http://alpha.uhasselt.be/Research/Algeb ... _white.txt

LInes for black:

http://alpha.uhasselt.be/Research/Algeb ... _black.txt

Regards,
Michel
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Guenther Simon » 29 Nov 2008, 16:10

Michel wrote:I wrote an extension to polyglot to dump polyglot books in human readable format.
I need to do a little bit of code cleanup but it is late. In any case here you can see
the output for gm2600.bin, one of the books that comes with Scid.

Lines for white:

http://alpha.uhasselt.be/Research/Algeb ... _white.txt

LInes for black:

http://alpha.uhasselt.be/Research/Algeb ... _black.txt

Regards,
Michel


This looks very promising!
Thanks Michel.

Regards,
Guenther
User avatar
Guenther Simon
 
Posts: 794
Joined: 26 Sep 2004, 19:49
Location: Regensburg, Germany

Re: Dump utility for polyglot books

Postby Michel » 29 Nov 2008, 23:37

It seems however that polyglot books are difficult to shoehorn into the classical opening book format. For example the venerable "performance.bin" has the following characteristics

Code: Select all
Lines for white       :    15177
Lines for black       :     9944
White positions       :    28164
Black positions       :    18974
Unreachable positions :    30734


"Lines for white" are lines where white makes only opening book moves and black makes arbitrary moves whose targets are positions in the book for white. "Lines for black" are defined similarly. The PG book positions on these lines are called respectively "White positions" and "Black positions".

Unreachable positions are positions that cannot be reached in this way.

Short of reversing the PG hash function, or performing deeper searches, one cannot know what these unreachable positions are.
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Michel » 30 Nov 2008, 10:51

I think it should be safe to delete unreachable positions from the book.

I would appreciate some comments from PG book authors on this. In any case I will write a utility (an extension of polyglot) to do this.

To be sure we have to understand how unreachable positions are created. Offhand I can see two mechanisms. Perhaps there are others.

(1) Transpositions. Assume we have a pgn file with two games with moves

Code: Select all
A B C D E F G H (1-0)
C B A D E F G H (1-0)


and we are creating a book with min-game=2. In that case E,G would be in the book for white but not A,C. So having E,G is useless since white will not play A or C.

(2) Merging. This command merges two books into a third one, resolving conflicts according to the following rule:

Code: Select all
If a position is in both books, take the moves from the first one.


Of course in this way (huge) parts of the second book may become unreachable. There is no reason why these unreachable positions should remain in the merged book. I think.
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Marc Lacrosse » 30 Nov 2008, 17:00

Michel wrote:I think it should be safe to delete unreachable positions from the book.

I would appreciate some comments from PG book authors on this. In any case I will write a utility (an extension of polyglot) to do this.

There is no reason why these unreachable positions should remain in the merged book. I think.


Hi Michel

I have exactly the opposite view on this topic.

What you call "unreachable positions" are not unreachable at all!
They just were not reached in the subset of games that have been retained within the book.

This does not mean at all that a game that would follow a slightly different path than the paths explicitly followed by the selected games from which the book was built could not reach one of these "unreachable" positions.
It could be the bookmaker wish that in this precise case his book should include some knowledge for the conduct of this precise and analogous games.

And secondly these "unreachable positions" do not do any harm in actual play. They even prove useful when your engine faces an opponent that uses unusual transpositional paths to enter his favorite openings.

Books are not about games, lines or any sequences of moves. They are actually collections of positions together with some knowledge regarding the optimal continuation from each of these positions.

If you consider that book building is about lines you have to be exhaustive regarding any paths to a given position when you build a book and that's a pretty impossible task. Older programs like Yace or Pharaon had only this kind of book building process (you had to collect recommended sequences of moves, not just positions with favored moves for each of them).

As a bookmaker I may notice that in one precise position my engine tends to choose a wrong move. I may wish to add this precise position with my recommended moves to the book without any consideration for the path or paths that could lead to this precise position.

This whole argumentation was extremely important for Fabien when he defined book functions to be integrated in Fruit and Polyglot!

In fact the one and only negative aspect of what you call "unreachable" positions is that any attempt at a dump-down in the form of sequences of moves will fail !

So if you build a utility for eliminating positions toward which there are no explicit path within the book you will actually lower the amount of knowledge that is included in this book and i am ready to bet that, for example, a version of performance.bin that you would prune this way should perform badly as compared with the original.

I suppose I would almost never use such a utility myself.

Marc
Marc Lacrosse
 
Posts: 116
Joined: 29 Jan 2005, 09:04
Location: Belgium

Re: Dump utility for polyglot books

Postby F. Bluemers » 30 Nov 2008, 17:40

Marc Lacrosse wrote:
Michel wrote:I think it should be safe to delete unreachable positions from the book.

I would appreciate some comments from PG book authors on this. In any case I will write a utility (an extension of polyglot) to do this.

There is no reason why these unreachable positions should remain in the merged book. I think.


Hi Michel

I have exactly the opposite view on this topic.

What you call "unreachable positions" are not unreachable at all!
They just were not reached in the subset of games that have been retained within the book.

This does not mean at all that a game that would follow a slightly different path than the paths explicitly followed by the selected games from which the book was built could not reach one of these "unreachable" positions.
It could be the bookmaker wish that in this precise case his book should include some knowledge for the conduct of this precise and analogous games.

And secondly these "unreachable positions" do not do any harm in actual play. They even prove useful when your engine faces an opponent that uses unusual transpositional paths to enter his favorite openings.

Books are not about games, lines or any sequences of moves. They are actually collections of positions together with some knowledge regarding the optimal continuation from each of these positions.

If you consider that book building is about lines you have to be exhaustive regarding any paths to a given position when you build a book and that's a pretty impossible task. Older programs like Yace or Pharaon had only this kind of book building process (you had to collect recommended sequences of moves, not just positions with favored moves for each of them).

As a bookmaker I may notice that in one precise position my engine tends to choose a wrong move. I may wish to add this precise position with my recommended moves to the book without any consideration for the path or paths that could lead to this precise position.

This whole argumentation was extremely important for Fabien when he defined book functions to be integrated in Fruit and Polyglot!

In fact the one and only negative aspect of what you call "unreachable" positions is that any attempt at a dump-down in the form of sequences of moves will fail !

So if you build a utility for eliminating positions toward which there are no explicit path within the book you will actually lower the amount of knowledge that is included in this book and i am ready to bet that, for example, a version of performance.bin that you would prune this way should perform badly as compared with the original.

I suppose I would almost never use such a utility myself.

Marc


Hi Marc
I guess I overlook something but anyway:
I thought all positions are entered in the book with/via pgn files.
So how can there be unreachable positions in it ?

edit:
I found the answer:
Code: Select all
- "-only-white" *** NEW ***

Save only white moves.  This allows to use different parameters for
white and black books, and merge them into a single file with the
"merge-book" command, see below.

- "-only-black" *** NEW ***

Same for black moves.

that would explain why positions cannot be reached.
Best
Fonzy
F. Bluemers
 
Posts: 175
Joined: 04 Sep 2008, 16:56
Location: Netherlands

Re: Dump utility for polyglot books

Postby Michel » 30 Nov 2008, 17:55

Hi Marc,

Just a quick reply as I have not much time.

I have not fully thought it through but perhaps there are two types of unreachable positions.

(1) Provably unreachable positions. Like the ones in my first example.
In this example white must make a non book move to get to the unreachable position.
If white has other book moves at that point he will never do that.
(so to make the example complete we would have had to include a third game).

(2) Positions that are "beyond the end of the book". I.e. they must be reached by non-book moves, but at plys where no book moves are available.

Unfortunately I suspect that the polyglot bookmaking utilites mainly produce unreachable positions of type (1) (as I tried to show in my previous mail). Manually inserted positions would of type (2) (as I gather from your description).

It is not easy to distinguish between type (1) and (2) positions but I have some ideas.

All the best,
Michel
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Michel » 30 Nov 2008, 18:01

Fonzy wrote

I guess I overlook something but anyway:
I thought all positions are entered in the book with/via pgn files.
So how can there be unreachable positions in it ?


I my post I gave two examples of how the PG utilities produce unreachable positions
(transpositions and merging).

only-white and only-black do NOT produce unreachable positions by themselves. They simply produce books without black or white lines.

EDIT: perhaps I should be clear what I mean by a line. A line for white is a sequence
of moves from the starting position containing only book moves for white and arbitrary moves for black. A line for black is defined similarly.

Regards,
Michel
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Marc Lacrosse » 30 Nov 2008, 19:21

Michel wrote:
only-white and only-black do NOT produce unreachable positions by themselves. They simply produce books without black or white lines.

EDIT: perhaps I should be clear what I mean by a line. A line for white is a sequence
of moves from the starting position containing only book moves for white and arbitrary moves for black. A line for black is defined similarly.

Regards,
Michel


OK but fundamentally even a completely "isolated" position in a PG book has some value (I prefer "isolated" over "unreachable" - some could think that "unreachable" positions are illegal ones) because it provides some knowledge regarding what should be played in a position that _can_ be encountered by the engine in practical play.

And this knowledge most often came from polyglot excellent heuristics based on your selection of games, or less often through direct adding and tuning.

So why should we discard this knowledge ? : I see no argument to say that PG heuristics did not work as well for selecting moves to be played in these positions as compared to the recommended moves for non-isolated ones.

Moreover I could well decide that I do _wish_ to add isolated positions manually in a PG book : imagine a position at move ten in a highly transpositional opening : I could well have detected that my favorite engine constantly errs in this position so I include just one position and my recommended move for this one just in case that precise position could happen on the board.
In such a case I do not care at all of the preceding moves and their actual order.

More generally I am pretty sure that an opening book has to be about positions and not lines.

And I repeat that we can test this : do prune performance.bin taking off all isolated "unreachable" positions : I am quite convinced you will get a much weaker book.

Regards

Marc
Marc Lacrosse
 
Posts: 116
Joined: 29 Jan 2005, 09:04
Location: Belgium

Re: Dump utility for polyglot books

Postby Michel » 01 Dec 2008, 10:24

Hi Marc,

I fully agree that an opening book is about positions and not lines. I am NOT somehow trying to subvert the PG format in terms of lines. But a view in terms of lines maybe
beneficial since humans tend to think in those terms.

The question is how should we view isolated positions (positions not on any lines). I DO understand your points in this matter. For the benefit of others let me try to summarize them.

(1) MY POINT: I think that many isolated positions produced by the PG utilities are provably unreachable in the sense that they cannot arise on the board if one of the players sticks to the book.

YOUR POINT: Even if a position is provably unreachable it still represents knowledge about the game. And it may become reachable if we introduce for example a new move. Furthermore one may very well introduce isolated positions manually which are not provably unreachable and thus beneficial.

(2) MY POINT: Isolated positions are not discoverable. You cannot know what they are until they arise in actual gameplay.

YOUR POINT: Not a problem.

(3) YOUR POINT: I am convinced "performance.bin" will be much weaker if we remove the isolated positions (and it will certainly not become stronger).

MY POINT: Hmm... I don't know. But the isolated positions take up almost half the book (if I did not make a mistake). So it would be interesting to know what their true benefit is.


This last point can of course only be resolved by testing. So I will make a version with the isolated positions pruned. It would only be for experimental purposes. I hope you are not offended by that.

Regards,
Michel
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Marc Lacrosse » 01 Dec 2008, 12:03

Michel wrote:Hi Marc,

I fully agree that an opening book is about positions and not lines. I am NOT somehow trying to subvert the PG format in terms of lines. But a view in terms of lines maybe
beneficial since humans tend to think in those terms.


I do completely agree.

Michel wrote:The question is how should we view isolated positions (positions not on any lines). I DO understand your points in this matter. For the benefit of others let me try to summarize them.

(1) MY POINT: I think that many isolated positions produced by the PG utilities are provably unreachable in the sense that they cannot arise on the board if one of the players sticks to the book.

YOUR POINT: Even if a position is provably unreachable it still represents knowledge about the game. And it may become reachable if we introduce for example a new move. Furthermore one may very well introduce isolated positions manually which are not provably unreachable and thus beneficial.

(2) MY POINT: Isolated positions are not discoverable. You cannot know what they are until they arise in actual gameplay.

YOUR POINT: Not a problem.

(3) YOUR POINT: I am convinced "performance.bin" will be much weaker if we remove the isolated positions (and it will certainly not become stronger).

MY POINT: Hmm... I don't know. But the isolated positions take up almost half the book (if I did not make a mistake). So it would be interesting to know what their true benefit is.


This last point can of course only be resolved by testing. So I will make a version with the isolated positions pruned. It would only be for experimental purposes. I hope you are not offended by that.

Regards,
Michel


Surely not offended (why should I ?).
This would be a much interesting experience.

Regards

Marc
Marc Lacrosse
 
Posts: 116
Joined: 29 Jan 2005, 09:04
Location: Belgium

Re: Dump utility for polyglot books

Postby Michel » 01 Dec 2008, 14:00

Just to add one more ingredient to this discussion. I can confirm that it is the merge utility that creates huge numbers of isolated positions (which are probably provably unreachable).

Merging two pgn files and then creating a book gives quite different results from first making two books and then merging them.

Here are two separate proposals for enhancing the merge utility which do not create isolated positions.

(1) In case positions appear in both books give the moves in the second book which are not in the first book weight zero. This is equivalent to the current behavior.

(2) Make the probabilities in the merged book the average of the probabilities in the original books.

These could be options to the "merge-book" command.

Michel
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Marc Lacrosse » 01 Dec 2008, 16:11

Michel wrote:Just to add one more ingredient to this discussion. I can confirm that it is the merge utility that creates huge numbers of isolated positions (which are probably provably unreachable).

Merging two pgn files and then creating a book gives quite different results from first making two books and then merging them.


And this is probably the most powerful book-making feature of Polyglot!
A book like performance.bin was made from six elementary books with varied parameters, merged in a very precise order.

Michel wrote:Here are two separate proposals for enhancing the merge utility which do not create isolated positions.

(1) In case positions appear in both books give the moves in the second book which are not in the first book weight zero. This is equivalent to the current behavior.

(2) Make the probabilities in the merged book the average of the probabilities in the original books.

These could be options to the "merge-book" command.

Michel


I do not completely see the point.
As you say yourself option "1" is equivalent to the current behavior.
I cannot figure easily in which case I could use option 2 for getting a more efficient result (better performing book). It is to be tested.
Imagine that for a given position book A has move 1 (50%) and move 2 (50%) while book B has move 1 (30%), move 2 (10%) and move 3 (60%) Averaging them will give move1 (40%), move 2 (30%) and move 3 (30%). hmm difficult to predict the practical effect of all these averagings.

Marc
Marc Lacrosse
 
Posts: 116
Joined: 29 Jan 2005, 09:04
Location: Belgium

Re: Dump utility for polyglot books

Postby Michel » 01 Dec 2008, 17:36

To make it simple. If you merge an e4 book with a d4 book then the positions for white in the d4 book become unreachable except possibly in lines where both e4 and d4 are played. I think this is not what you want.

As you say yourself option "1" is equivalent to the current behavior.


Yes the behavior is the same, but now the positions which came from the second book become at least discoverable for a GUI or dump utility.

An furthermore later you may decide to change the zero probability to something bigger.
hmm difficult to predict the practical effect of all these averagings.


Well it is just a suggestion. In the above d4/e4 example I can imagine you would want both moves to have the same probability.
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Michel » 02 Dec 2008, 12:02

I improved my dump utility a bit. It now also shows probabilities.

gm2600.bin for white

http://alpha.uhasselt.be/Research/Algeb ... _white.txt

gm2600.bin for black

http://alpha.uhasselt.be/Research/Algeb ... _black.txt

performance.bin (Marc Lacrosse) for white

http://alpha.uhasselt.be/Research/Algeb ... _white.txt

performance.bin for black

http://alpha.uhasselt.be/Research/Algeb ... _black.txt

Glaurung's book (Salvo Spitaleri) for white

http://alpha.uhasselt.be/Research/Algeb ... _white.txt

Glaurung's book for black

http://alpha.uhasselt.be/Research/Algeb ... _black.txt

Recall that these are the lines obtained when one(not both) of the players makes only book moves. These dumps do not reflect isolated positions.
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Michel » 06 Dec 2008, 15:31

I made my utility for analyzing polyglot books a bit more refined
Code: Select all
./polyglot info-book -bin /usr/local/share/scid/books/Performance.bin -exact

Code: Select all
PolyGlot 1.4 by Fabien Letouzey
Lines for white                :    15177
Lines for black                :     9944
Positions on lines for white   :    28164
Positions on lines for black   :    18974
Unreachable white positions(?) :    10865
Unreachable black positions(?) :    11127
Isolated positions             :     8742


Unreachable positions are positions which cannot appear on the board. For example in this case 1. f4 Nf6 is the book for white but this is of no help since if white follows the book he will only play c4, d4 or e4. My heuristic for identifying such positions is not entirely foolproof so that is the reason why there is a question mark.

For Glaurung's book I get
Code: Select all
 ./polyglot info-book -bin spitaleri.bin -exact

Code: Select all
PolyGlot 1.4 by Fabien Letouzey
Lines for white                :     9749
Lines for black                :    14000
Positions on lines for white   :    84822
Positions on lines for black   :   122999
Unreachable white positions(?) :    54351
Unreachable black positions(?) :    27467
Isolated positions             :      852


Some more
Code: Select all
 ./polyglot info-book -bin /usr/local/share/scid/books/gm2600.bin -exact

Code: Select all
Lines for white                :     5360
Lines for black                :     5459
Positions on lines for white   :     7750
Positions on lines for black   :     8143
Unreachable white positions(?) :      117
Unreachable black positions(?) :      116
Isolated positions             :      194


Code: Select all
./polyglot info-book -bin /usr/local/share/scid/books/varied.bin -exact

Code: Select all
Lines for white                :    18426
Lines for black                :    13435
Positions on lines for white   :    34388
Positions on lines for black   :    25592
Unreachable white positions(?) :     4472
Unreachable black positions(?) :     4343
Isolated positions             :     8817


Code: Select all
./polyglot info-book -bin /usr/local/share/scid/books/Elo2400.bin  -exact

Code: Select all
Lines for white                :    46005
Lines for black                :    48602
Positions on lines for white   :    51081
Positions on lines for black   :    55459
Unreachable white positions(?) :      691
Unreachable black positions(?) :      799
Isolated positions             :     1021
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Guenther Simon » 26 Dec 2008, 20:40

Michel wrote:I improved my dump utility a bit. It now also shows probabilities.

...

Recall that these are the lines obtained when one(not both) of the players makes only book moves. These dumps do not reflect isolated positions.


Hello Michel,

Will you publish the dump utility too?

Guenther
User avatar
Guenther Simon
 
Posts: 794
Joined: 26 Sep 2004, 19:49
Location: Regensburg, Germany

Re: Dump utility for polyglot books

Postby Michel » 27 Dec 2008, 08:20

Will you publish the dump utility too?


Yes, but they are modifications of the book making utilities in Polyglot. I don't want to too publish to many modifications to PG at once.
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Re: Dump utility for polyglot books

Postby Dann Corbit » 29 Dec 2008, 21:16

Can your utility display the unreachable positions?
If you dump them as EPD strings, I think it would be interesting to examine them.

It seems a great puzzle that they would be entered into the book if they cannot be achieved by retrograde somehow.
Dann Corbit
 

Re: Dump utility for polyglot books

Postby Michel » 30 Dec 2008, 09:18

Can your utility display the unreachable positions?
If you dump them as EPD strings, I think it would be interesting to examine them.

It seems a great puzzle that they would be entered into the book if they cannot be achieved by retrograde somehow.


Well the best method would be to try to reverse the PG hash somehow, taking into account
the constraints on a normal chess position. I do not know how to do this. A cryptographer I asked declared it "trivial". But he was not forthcoming on details.

The other option is searching from known positions which takes a lot of time (you have to do it for all known positions). I sort of do a 2-ply search now but there are some positions I do not recover. I am sure it can be done much more efficiently.

I explained the mechanism for creating unreachable in earlier posts. Suppose
you have an e4 and a d4 book and you merge them. Then the first book takes priority and hence only e4 will be in the book; making most of the d4 positions provably unreachable.

It idea behind the priority mechanism is that if you merge an Elo2600 book with an Elo2400
book the moves in the Elo2600 book should take precedence.

If you really want to merge an e4 and a d4 book and keeping both e4 and d4 you can do it now on the source pgn files instead of on the resulting binary books.

One of the things I would like to do is to make this kind of symmetric merging possible on the level of binary books (although Marc Lacrosse somehow does not seem to like this).
Michel
 
Posts: 513
Joined: 01 Oct 2008, 12:15

Next

Return to Winboard and related Topics

Who is online

Users browsing this forum: Google [Bot] and 16 guests