Currently Polyglot derives the weights of its books from a collection of PGN games purely from the frequencies that move has been played from the given position with various results, as 2*wins + 1*draws + 0*losses. This suffers from several problems. For instance, it is debatable if losses should really not be counted at all. If a position occured 10 times, all in lost games, and grandmasters played move A 8 times and move B 2 times, it is still likely that move A is better than move B. Some moves lose faster than others, and a good move might turn a lost position against a weaker opponent into a win, while a poor move would lose even to a patzer. A second problem is that the current algorithm does not take into account that the position to which a move leads might be reachable through other paths, also contributing to the accuracy of the WDL statistics. E.g. position A might have been played only 5 times, because it is an unusual transposition, and a move to position B rom it might have been played only two times, one draw, one loss. Not really enough to conclude that B is a poor move with any degree of certainty. But poition B might occur 1000 times in the PGN set through other paths, with 500 wins, 200 draws and 300 losses (for the side that had to move in position A). So the information would be there to conclude that position B is almost certainly pretty favorable for us, with ~60% score. Polyglot would currently fail to spot that, and consider only the two times the move was played from A.
If weights are supposed to represent (relative) winning probabilities, or more accurately, estimates for the expected result, they should be derived from PGN statistics in an entirely different way. (The problem of how to convert such winning probabilities in a move choice, by making the trade-off between unpredictability and quality, is an independent one, which can be easily addressed at probe time, and which I won't discuss now.)
Chess strategy is based on minimax. Look at the following hypothetical situation: A position A occurred 1000 times in the PGN set, and 900 times a move AB was played to position B, 100 times a move AC to position C. Position B occurred 950 times, and had an average result of 45%. Position C occurred 105 times, with an average result of 60%. The average result of position A was thus 0.9*45%+0.1*60% = 46.5%. Does that mean you should avoid position B? Of course not! When your intention is to play AC from A, position A will give you an average result of 60%! Is this now a far-fetched example? If move AC is so good compared to AB, why would AB have been played 9 times as much as AC? People would prefer the best move, not? But there are very plausible reasons why this situation could occur in practice. Move AC might be a novelty, a recently discvered refutation of the line. Before it was known people always played AB here, with poor results. Then move AC was discovered, with good results. After having been tested 105 times, everyone beleived AC refuted the line, so the opponents started to avoid position A, so AC never got the opportunity to statistically surpass AB.
So a logical way to build the book is to not simply look at the gross average result of a certain position, but in stead look at the result of the daughter position individual moves leading from it, and then using only using the statistics of that best move. This pure minimax approach would only work if the average result (whatever form of "averaging" was used) for each position was precisely know. In practice you would of course run quickl into positions with such a small occurrence that this is no longer the case, and you have to compromise. If a position occurred only twice, two different moves were played, one winning and the other losing, it makes no sense to say "from here I would play that move which had a 100% result, so this position will give me a 100% result". More reasonable to say would be: "This position occurred two times, and I have not enough statistics to decide which move was better, so I assume they are equally good, and together they scored 50%". So depending on the statistical error in the average result of each of the daughter nodes, you woud have to combine them to get an average result and statistical uncertainty in it for the current position. In the limit of infinitely large occurrence, where the uncertainties vanish, you would use minimax, in the other limit, of very low occurrence, you would use plain averaging. For intermediate occurence frequencies you would have to use some transition between the two. I am pretty sure a mathematically sound formula could be derived from this based on Bayesian statistics (although I have not yet done so). I was told that thi is essentially how Monte-Carlo search works as well, so perhaps the required formula is already known from that area of research.
Of course the ratings could also be taken into account. If a position occurred 10 times, with 60% result (for white), but the average rating of the white players was 300 higher than that of the black players, this would indicate that the position is actually very favorable for black (as based on the ratings an 85% result should have been expected). This is a bit dangerous, as it assumes the rating differences are not based on uperior opening knowledge. So perhaps the rating difference should be discounted somewhat, and more so as the position occurred deeper into the game, where the players already have "used up" the opening component of their rating. (This in itself would be interesting to investigate: how does a rating difference translate into an average result for positions other than the standard opening position.)
I think it would be very worthwile to implement such an algorithm in the book-building mode of Polyglot.