Winboard Forum

by **xinix** » 29 Aug 2008, 07:37

When they tried to optizime the Deep Blue eval params, the use 876 grandmaster games to tune the params.

They found 2 things:
1) tuning for grandmaster moves is not exactly the same as playing better
2) deeper searches gave better results.

Nowadays we have a lot more grandmaster games available. What if we'd do a 5 ply search over a 10.000 games and tune for that ? Or 1 ply over 1 million games ?

Tony

by **Uri Blass** » 29 Aug 2008, 08:21

xinix wrote:When they tried to optizime the Deep Blue eval params, the use 876 grandmaster games to tune the params.

They found 2 things:
1) tuning for grandmaster moves is not exactly the same as playing better
2) deeper searches gave better results.

Nowadays we have a lot more grandmaster games available. What if we'd do a 5 ply search over a 10.000 games and tune for that ? Or 1 ply over 1 million games ?

Tony

I do not think that it is a good idea
Today we have something that is clearly better than GM's(namely Rybka3)

It may be better to use Rybka3-Rybka3 games in order to tune your parameters.

Uri

by **Zach Wegner** » 01 Sep 2008, 03:34

Well, this isn't exactly what you are talking about, but I've been investigating autotuning lately. I think the methods we have are really far away from what is possible. The problem is I have no idea what is possible.

I was reading an old post in the CCC archives where Allard Siemelink describes his approach. Pretty interesting stuff: http://www.talkchess.com/forum/viewtopi ... =&start=40

It's much different than most techniques I've seen, just using static eval to predict the outcome of the game. While this does throw a lot of information out about each individual move, perhaps it is better, as we don't assume every single move (at least by the winning side) is the best move. I think it might work a little better as training data.

Using his method is still a bit tricky. I'm not sure exactly what he does even. What I've been doing is running a qsearch on each move, and then running the qsearch again with a certain eval term zeroed out. I average these out over the game so I can see if the term helps the prediction of games, and how much.

The trouble is that it's pretty vague to simply average every eval. I've been segmenting it based on absolute score, but the problem here is that each term doesn't depend on what the final eval score is.

Thoughts?

by **Pradu** » 01 Sep 2008, 07:35

Zach Wegner wrote:Thoughts?

One possible way is to think of the problem as an generic nonlinear optimization problem. You have some nonlinear function that estimates the result of a game eval(x,w) where x is the features of the position and w is the weight of each eval term. Your objective function can be some measure of the error of the eval/qsearch to the actual result of that position (which you can obtain from PGN games). There are many ways you could make an error function -- absolute value, square (this becomes nonlinear least squares fitting), MLE... For now lets just assume the error function for the position is error(x,w). Then you could sum the error function over a series of positions weighted on their importance and create an objective function f(x,w,positions). Then you can minimize the objective using standard nonlinear optimization techniques and your control variables will be the weights of the evaluation function.

I haven't tried this yet but it is certainly one of the top things on my list to try after I get my new C++ Buzz working. (The other top thing to try is coarse to fine discretized global search for evaluation tuning :mrgreen:

).

For more information on learning about nonlinear programming (nonlinear optimization... same thing) I suggest Bazaraa. It's full of pretty hard-core mathematics but Bazaraa explains everything in detail and overall I think it's a pretty good book. It has a review on the math (sets, linear algebra...) in the appendix as well.

One of the biggest problems to overcome for using this technique is how to take into account tactics. Qsearch is a pretty good idea but it fails miserably in very deep tactics like opening gambits. Of course you can use a full fledged search but then it'll take more time. So there's a tradeoff here between the time it takes to optimize and the accuracy of the optimization.

Instead of just using the result of the game you might also try to first go through the entire game and do search/eval/qsearch on each position and store the evaluation and use the information on how the evaluation progresses.

Another problem is the weight you will assign a position. You might come across many moves of some endgame that takes very long to win and a lot of moves. Because this type of position occurs very often as compared to other extremely important but short-lived advantages ... say fast king attacks. This will make the optimizer favor the endgames more than the king attacks. Perhaps one way to solve this problem is to use information on how the evaluation progresses instead of just the result of the game.

One advantage of this method is that it'll produce a measure of how good your evaluation function is. In addition to knowing the final weights, you'll also have the sum of the errors of each position and you can use this as a measure of how good your eval is in predicting a game's outcome. If you change your eval and rerun the optimization and find that it reached a smaller minimum... you have improved your eval.

This is all pretty theoretical and I don't even know if it'll work but I thought I might as well put it out there.

by **xinix** » 01 Sep 2008, 10:13

My thoughts were slightly less complicated.

Missing tactics ? So what, whatever values your eval gets to ( assuming no pawn=knight stuff), you will still miss them, so they don't influence the results. ( just do qs)

Weights ? You take the amount of points scored with the move.

The only "complication" I had in mind was skipping the first 10 moves. Then just try to maximize the amount of points scored.

My basic thought was that although this is not sophisticated, the amount of (grand)master games will offset that.

Tony

by **Roger Brown** » 01 Sep 2008, 13:04

Uri Blass wrote:
I do not think that it is a good idea
Today we have something that is clearly better than GM's(namely Rybka3)

It may be better to use Rybka3-Rybka3 games in order to tune your parameters.

Uri

Hello Uri,

Short of a direct line to the almighty programmer, do you mind telling me why it is not a good idea?

Rybka 3 is clearly better than GM's at what phase of the game?

All of them?

At what time control is it better?

So is a thousand games at 5 minutes sufficient?

Precisely how many games and at what timecontrol?

Wouldn't Rybka 3 versus Rybka 3 reinforce the weaknesses of Rybka 3 (apparently you believe that there are none) in any tuning scheme?

Just some of the reasons I find your answer short of your usual high standard.

Incidentally, for Rybka substitute any engine - the answer remains the same.

I think a range of engines might be a good thing. I believe that the irrepressible Dann Corbit has information which may prove useful to this idea. The man is a data freak. He can manufacture it (or find it if it exists) or he has it.

Later.

by **Uri Blass** » 01 Sep 2008, 15:51

Roger Brown wrote:
Uri Blass wrote:
I do not think that it is a good idea
Today we have something that is clearly better than GM's(namely Rybka3)

It may be better to use Rybka3-Rybka3 games in order to tune your parameters.

Uri

Hello Uri,

Short of a direct line to the almighty programmer, do you mind telling me why it is not a good idea?

Rybka 3 is clearly better than GM's at what phase of the game?

All of them?

At what time control is it better?

So is a thousand games at 5 minutes sufficient?

Precisely how many games and at what timecontrol?

Wouldn't Rybka 3 versus Rybka 3 reinforce the weaknesses of Rybka 3 (apparently you believe that there are none) in any tuning scheme?

Just some of the reasons I find your answer short of your usual high standard.

Incidentally, for Rybka substitute any engine - the answer remains the same.

I think a range of engines might be a good thing. I believe that the irrepressible Dann Corbit has information which may prove useful to this idea. The man is a data freak. He can manufacture it (or find it if it exists) or he has it.

Later.

I do not claim that rybka has no weakness but humans have more weaknesses.

I am not sure if you can find a lot of rybka-rybka games but you can find
a lot of games Rybka-Zappa,Rybka-Naum,Rybka-Shredder from CCRL or CEGT and even these games are higher level than human-human games.

I am not sure if using games to improve the evaluation is a good idea but if games can help then it seems to me that it is better to use better games so no point in using human-human games unless you talk about correspondence games when humans use computers to help them.

Uri

by **Pradu** » 01 Sep 2008, 17:38

xinix wrote:Missing tactics ? So what, whatever values your eval gets to ( assuming no pawn=knight stuff), you will still miss them, so they don't influence the results. ( just do qs)

I'm not pulling this out of the air ... I have some data to back up that the value you tune to is strongly dependent on tactics especially in the opening phase. I suggest that if you perform deeper searches, your eval will get tuned better.

http://www.prism.gatech.edu/~gtg365v/Bu ... tatistics/

Please take a look at the views*.png.

My conversion from pawn value to win% comes from here:

http://chessprogramming.wikispaces.com/ ... 2C+and+ELO

The plot is simply the value of the pawn depending on the number of pawns on the board. Plys is the number of plys I wait after a capture before I take data to try to get rid of tactical effects. As you can see the value of the pawn was zero or even slightly negative if the number of moves I wait for is below 5 when all pawns are on the board. However as I increase the number of plys, the value of the pawn stabilizes to around 60-70 when all pawns are on the board.

Weights ? You take the amount of points scored with the move.

You are tuning with whether your eval picks a move or not. This is different approach than what I was talking about because what I want to do is to tune the evaluation to predict the outcome (or future state) of the game.

The only "complication" I had in mind was skipping the first 10 moves. Then just try to maximize the amount of points scored.

I'm not trying to complicate things, what I suggested is indeed a feasible idea I would like to try myself. Maximizing the amount of points scored as well is a good idea but you'll have a very noisy nonlinear problem which is harder to optimize. In addition if you go by the maximizing amount of points approach, you will have little good data other than at the opening stages. For example say you had a position that's loosing anyways and you only have one game for that position. It so happens that your engine picks the move was played in the position. Do you think subtracting points (or not adding any) would be a good idea even though the position might have been lost anyways--similar argument for won positions.

For example for the weighting... if your engine picks the move(s) played in the GM game, you'll assign a larger weight for the error on that position depending on how many moves in the PV are correlated with the game. This reduces the weight of evals on what the engine perceives through it's search as not the best move and increases the weight on what the engine perceives and actually happens on the board.

by **Zach Wegner** » 01 Sep 2008, 19:06

xinix wrote::) My thoughts were slightly less complicated.

Missing tactics ? So what, whatever values your eval gets to ( assuming no pawn=knight stuff), you will still miss them, so they don't influence the results. ( just do qs)

I would think the same thing, but I would guess the deeper the search the better.

Weights ? You take the amount of points scored with the move.

The only "complication" I had in mind was skipping the first 10 moves. Then just try to maximize the amount of points scored.

My basic thought was that although this is not sophisticated, the amount of (grand)master games will offset that.

Tony

Possibly. I will note that I do skip the first 10 moves of the game.

The problem that I have with matching GM moves is that it's impossible to tell whether the engine prefers a different move because it's a blunder or the eval sucks. There's probably a lot of both. Perhaps what might work here is looking at how the eval changes from qsearch to 1-ply, or 1 to 2 ply. If the higher depth search agrees with the GM move, but the lower doesn't, then it might be more likely due to eval. But then tactics become a problem. The tactical difference between 1 and 2 ply is huge, so the higher the depth the better (10 to 11 ply). Then of course your tuning starts to take days to complete one iteration.

by **Zach Wegner** » 01 Sep 2008, 19:13

Pradu wrote:For example for the weighting... if your engine picks the move(s) played in the GM game, you'll assign a larger weight for the error on that position depending on how many moves in the PV are correlated with the game. This reduces the weight of evals on what the engine perceives through it's search as not the best move and increases the weight on what the engine perceives and actually happens on the board.

This is a good idea too. I think it's a pretty good way to get more information based on the moves, while not over-relying on them. I think predicting the outcome is the right way to go really.

You also make a good point about examining the trend of the eval throughout the game. I don't know the best way to harness this, but perhaps whenever the eval takes a sharp turn in either direction you should not give much consideration to what happened before that point. The eval during the opening phase doesn't have much bearing on the outcome if one side makes a huge blunder.

One more thought: perhaps it would be best to throw out (or give little weight to) absolutely won/lost positions. The eval there probably doesn't matter much, and trying to correlate a -9 to a loss is probably just going to mess the data up.

by **Zach Wegner** » 01 Sep 2008, 19:35

Here's a bit of data. This is for the bishop pair. I have it segmented by the eval: the middle line is for the range of -.5 to .5, and each line above and below it has a window one pawn away.

Code: Select all: pos=2488 score=0.33 eval=(-7.22 1.543) p eval=(-0.20 47.125) pos=2279 score=0.28 eval=(-3.92 9.479) p eval=(-0.13 48.130) pos=5283 score=0.27 eval=(-2.91 15.774) p eval=(-0.11 48.417) pos=16257 score=0.32 eval=(-1.89 25.200) p eval=(-0.14 47.986) pos=66377 score=0.41 eval=(-0.87 37.735) p eval=(-0.17 47.555) pos=194183 score=0.49 eval=(+0.04 50.576) p eval=(-0.03 49.568) pos=130857 score=0.55 eval=(+0.91 62.804) p eval=(+0.11 51.583) pos=33531 score=0.63 eval=(+1.90 74.908) p eval=(+0.19 52.732) pos=10575 score=0.68 eval=(+2.92 84.302) p eval=(+0.16 52.301) pos=4687 score=0.68 eval=(+3.93 90.571) p eval=(+0.16 52.301) pos=7469 score=0.60 eval=(+8.34 99.184) p eval=(+0.01 50.144)

Pos=number of positions sampled
Score=outcome of the games
Eval=average eval (based on a 1 ply search), and expected outcome (based on Pradu's one pawn=100 elo)
p eval=average difference in eval caused by the bishop pair. The percentage shown is meaningless really

Additionally, two corrections are made: a data point is only taken if the term causes the eval to be different (i.e. the feature is present on the board), and mate scores are thrown out.

There are some weird things about this data. The win percentage of the GM games actually goes in the wrong direction for the extremes in score. I'm don't think this is a bug, but I could definitely be wrong. This also gives some weight to the idea of throwing out data with extreme evaluations. The GM games are far too close to even from what would be expected. The other weird thing to me is that though the bishop pair term helps the eval go in the right direction (towards the extremes), it actually decreases the prediction power of the eval, seemingly due to the very even scores.

So this is really just a rough starting point. I've been working on this for a couple of days really, not something I would adjust the eval on.

by **xinix** » 02 Sep 2008, 17:55

Pradu wrote:
xinix wrote:Missing tactics ? So what, whatever values your eval gets to ( assuming no pawn=knight stuff), you will still miss them, so they don't influence the results. ( just do qs)
I'm not pulling this out of the air ... I have some data to back up that the value you tune to is strongly dependent on tactics especially in the opening phase. I suggest that if you perform deeper searches, your eval will get tuned better.

http://www.prism.gatech.edu/~gtg365v/Bu ... tatistics/

Please take a look at the views*.png.

My conversion from pawn value to win% comes from here:

http://chessprogramming.wikispaces.com/ ... 2C+and+ELO

The plot is simply the value of the pawn depending on the number of pawns on the board. Plys is the number of plys I wait after a capture before I take data to try to get rid of tactical effects. As you can see the value of the pawn was zero or even slightly negative if the number of moves I wait for is below 5 when all pawns are on the board. However as I increase the number of plys, the value of the pawn stabilizes to around 60-70 when all pawns are on the board.

Weights ? You take the amount of points scored with the move.
You are tuning with whether your eval picks a move or not. This is different approach than what I was talking about because what I want to do is to tune the evaluation to predict the outcome (or future state) of the game.

The only "complication" I had in mind was skipping the first 10 moves. Then just try to maximize the amount of points scored.
I'm not trying to complicate things, what I suggested is indeed a feasible idea I would like to try myself. Maximizing the amount of points scored as well is a good idea but you'll have a very noisy nonlinear problem which is harder to optimize. In addition if you go by the maximizing amount of points approach, you will have little good data other than at the opening stages. For example say you had a position that's loosing anyways and you only have one game for that position. It so happens that your engine picks the move was played in the position. Do you think subtracting points (or not adding any) would be a good idea even though the position might have been lost anyways--similar argument for won positions.

For example for the weighting... if your engine picks the move(s) played in the GM game, you'll assign a larger weight for the error on that position depending on how many moves in the PV are correlated with the game. This reduces the weight of evals on what the engine perceives through it's search as not the best move and increases the weight on what the engine perceives and actually happens on the board.

I tried tuning on the grandmaster games the way I think you did, but my data didn't look good. I hoped by simply searching a few ply ( and by the power of numbers) I could get it more stable the easy way.
No luck I guess.

Tony

Winboard Forum

Adjusting weights the Deep Blue way

Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Re: Adjusting weights the Deep Blue way

Who is online