Zach Wegner wrote:Thoughts?
One possible way is to think of the problem as an generic nonlinear optimization problem. You have some nonlinear function that estimates the result of a game eval(x,w) where x is the features of the position and w is the weight of each eval term. Your objective function can be some measure of the error of the eval/qsearch to the actual result of that position (which you can obtain from PGN games). There are many ways you could make an error function -- absolute value, square (this becomes nonlinear least squares fitting), MLE... For now lets just assume the error function for the position is error(x,w). Then you could sum the error function over a series of positions weighted on their importance and create an objective function f(x,w,positions). Then you can minimize the objective using standard nonlinear optimization techniques and your control variables will be the weights of the evaluation function.
I haven't tried this yet but it is certainly one of the top things on my list to try after I get my new C++ Buzz working. (The other top thing to try is coarse to fine discretized global search for evaluation tuning
).
For more information on learning about nonlinear programming (nonlinear optimization... same thing) I suggest
Bazaraa. It's full of pretty hard-core mathematics but Bazaraa explains everything in detail and overall I think it's a pretty good book. It has a review on the math (sets, linear algebra...) in the appendix as well.
One of the biggest problems to overcome for using this technique is how to take into account tactics. Qsearch is a pretty good idea but it fails miserably in very deep tactics like opening gambits. Of course you can use a full fledged search but then it'll take more time. So there's a tradeoff here between the time it takes to optimize and the accuracy of the optimization.
Instead of just using the result of the game you might also try to first go through the entire game and do search/eval/qsearch on each position and store the evaluation and use the information on how the evaluation progresses.
Another problem is the weight you will assign a position. You might come across many moves of some endgame that takes very long to win and a lot of moves. Because this type of position occurs very often as compared to other extremely important but short-lived advantages ... say fast king attacks. This will make the optimizer favor the endgames more than the king attacks. Perhaps one way to solve this problem is to use information on how the evaluation progresses instead of just the result of the game.
One advantage of this method is that it'll produce a measure of how good your evaluation function is. In addition to knowing the final weights, you'll also have the sum of the errors of each position and you can use this as a measure of how good your eval is in predicting a game's outcome. If you change your eval and rerun the optimization and find that it reached a smaller minimum... you have improved your eval.
This is all pretty theoretical and I don't even know if it'll work but I thought I might as well put it out there.