Yes, this is very helpful, thnks.
Marc Lacrosse wrote:Hmm really your post leads me to think that you know close to nothing regarding the actual building,use and optimisation of commonly used engine opening books.
I guess this is pretty accurate assessment...
Before explaining a few things, I would strongly say I do not see almost any interest in a ctg-to-polyglot converter now that the newly released aquarium book manager allows to directly use any ctg book in any uci-compliant interface.
This could be. My initial incentive for looking into this is that I disliked the ide of having to play engines through a multitude of adapters, and was not sure how well the aquarium book supports WinBoard engines. It would be a nuisance having to play through Polyglot + Book adapter + WB2UCI. I already have written and used some book-converting software before, and I figured adapting that to another source format would be a trivial job. Working on the probem furthermore aroused my academic curiosity.
A book is simply a collection of positions, candidate move(s) from each of these positions, and relative probabilities for each of these moves to be selected for actual play when a recorded position is met. The probability of being played is the weight of each move.
OK, fully agreed. This is in the end what it all boils down to: for every (position, move) pair the book has to define a probability for the move to be played. Polyglot books directly record this probability, as the move weight. I would call that a "cooked" book, as opposed to a book format that stores other information, from which the engine would have to derive the probabilities at probe time, by performing some calculation on the data stored in the book. The latter I would call a "raw" book.
In the case of polyglot, the weight of a move is initially established when building the book from a collection of selected games according to a formula that is well explained on Michel's site (and -but this is another problem - I am sure that the formula could be improved! ).
OK, I am familiar with the Polyglot format because I ported Michel's probing code for it to WinBoard. I just checked how Polyglot derives the weights from the raw PGN statistics during book-making; this is indeed an area that is candidate for improvement. It now uses the playing frequency of the move, weighted by result (2*wins + 1*draws + 0*losses), without paying attention to other paths leading to the same position as the move leads to. (E.g. a move could have been played only twice, both in losing games, but lead to a position that had occured a dozen times, 8 times in wins, but it would still get weight 0.) But that is a bit off-topic here; perhaps we should devote another thread to that.
Then in a second stage polyglot move weights can be modified either through the book-merge feature or through direct editing of the weights, as is possible with the SCID database software.
Then the ctg case.
Here things are much more complicated. From what I understand (I am far from being a ctg expert) first step of book building is similar to polyglot's way : you provide a selection of games and the book is built. Never was published any information regarding chessbase's formula for turning games into weights.
Clearly their formula overweights number of games over winning statistics.
Nobody knows precisely how they deal with relative amount of draws.
This initial weighting of the moves after building of the book is permanently recorded in the book (maybe it can be altered in some way afterwards).
I used the description of the ctg format from the link given in one of the earlier post, and it seems ctg books are "raw" books. There are no weights stored in them. In stead there are fields storing the total number of games, nr of wins, draws and losses, and average rating. The latter is stored as total number of games and sum of all ratings, (so basically as a rational number, giving enumerator and denominator as integers). The total number of games is thus stored twice, while it seems redundant. (I can imagine that the rating would leave out unrated games, although I am not sure ow relevant such an "average" rating would be, but could the total nr of games be different from wins+draws+losses???)
Then the user of the book in CB interfaces will have the choice of using the book in two different modes : "tournament" mode and "normal" mode. I suppose that in "normal" mode the initial weighting is used (and this must be one of the two flags you are referring to for each move).
Then there is the "tournament" mode. In this mode other weights are used for candidate moves (and these modified weights probably correspond to the second flag you are referring to). These modified weights may result mainly of 1) manual hand-tuning of the book (with "red" and "green" flags for candidate moves, discarding or forcing certain moves) and 2) automated learning through playing with the book from within a CB interface (local games at home or online games on the Playchess server). There is even a third way to tune weights through a kind of selective import of lines that should be favored for white or black.
Wel, there are no weights stored in the book, so they have to be calculated at probe time, and I guess this rules out any manual tuning. The only thing it can do is use another formula to derive the weights from the raw book data when you select tournament mode. I could imagine that in fact it would apply the same formula as in "normal" mode, except that it ignores all moves leading to positions that are marked as "non-tournament". Selecting those positions would most likely be part of the hand-tuning preparatory process. I am not sure how a book builder would use this flag; it cannot be based on the position being very good or bad, because if it is good for white, black should avoid it in tournaments, but white of course should seek it. So perhaps it should be used for unclear positions, or perhap positions that are in theory equal (as shown by PGN statistics) but which the engine is known (or suspected) to handle poorly for both black and white.
So a ctg user ends with a double-faced book.
Once again there is no precise published information regarding any aspect of these mechanisms.
So if you were to translate ctg weights into polyglot ones I would suggest to identify how are recorded the "tournament" weights.
These are the real weights that the book publisher wishes to be played.
They are supposed to be stronger and are the ones that any CB customer would actually use when trying to get optimal results.
Yes, I guess we really would have to know what formula the probing code applies to convert WDL statistics on positions into move weights.