Again, rep-draws (and score aging)
Posted: 24 Apr 2007, 12:45
Joker currently handles three-fold repetition draws by only scoring the third (tree) occurrence of a position as a draw, and suppressing hash probes and stores on repeated nodes. This guards the hash table from pollution by path-dependece, as the path-dependent stuff is all figured out between second and third occurrence, which are not stored. If the draw score of the 3rd occurence propagates to the 2nd (meaning that it is unfavorable for any side to break out of the repetition cycle), the position is really a draw irrespective of the path through which this second occurence was reached. So there is no harm in the draw score propagating down to all positions between first and second occurrence.
This works, but the problem of this method is that, if there is a perpetual on the board, most other lines can take 'the long way', visiting each position twice by interjecting 4 ply from the perpetual before continuing with the true pursuit of the alternative line. (e.g. see http://www.vpittlik.org/wbforum/viewtopic.php?t=6387 .)
It makes no sense at all to have lines that visit the same position twice. The second occurence is just a metaphore for the first occurrence, you really want to redo the search of that repeated node under different conditions of draw-scoring the repeat (causing path dependence), and not hashing the intervening nodes. So I guess what is really needed, is that once the search reaches the 2nd occurrence of a position, it should extend the line such that it searches that node to the same depth as was requested for the search of the first occurrence. (This depth should thus be stored in the repetition table.)
In fact you would redo the search as soon as you discover that returning to that same position might be relevant (i.e. it was not pruned before you arrived there), although it might still happen that you find side branches of the path between 1st and 2nd occurence that after all will make it irrelevant.
So you first do the search starting from the 2nd occurrence, to the depth that was requested for the 1st occurrence, under the assumption that a 3rd return to the position is a draw, but without storing the results for the repeated nodes in the hash (to prevent contamination). This search will tell you if the repetition is forced, or if one of the sides prefers to break out of it. Whatever the case, the result will be the path-independent score of the 2nd occurrence.
If the result of the second occurrence was a draw score, you award it to the move that led to the repeat, and finish the search of the first occurrence. This for the benifit of hashing the nodes on the path between the 1st and 2nd occurrence, which was suppressed in the repeat path between 2nd and 3rd occurrence. This 'finishing of the search' is mainly a formal thing, as all side branches of the repeat cycle, in so far they were not searched already, now have been searched to the requested depth as part of the search of the 2nd occurence. And the side branches were hashed (in so far they were not repeats themselves). So there is not much extra time involved in searching the repeating position twice, each side branch is only searched once, be it before or after the repeat is encountered, and the second time it is taken from the hash. Only the nodes on the repeat path really are searched twice.
But (and now comes the crux), if it was favorable to break out of the repeat, giving the current move its true score does cause a problem. The branch that leads to this score, searched in the sub-tree from the 2nd occurrence, might not have been searched yet in the tree from the first occurrence. And if we encounter it later the score will be axactly the same, as a long line and a short line leading to the same end-leaf in minimax have the same score. And in a case where a new move produces an equal score to a move we already have, we consider it a fail low, and stick with the old move. Also in cases where the new move was the short-cut, and the old move was the detour through the 2nd repeat! On the other hand, we cannot arbitrarily adjust the score, as this will cause contamination in the underlying nodes (which are hashed).
A solution to this dilemma can be provided by the general mechanism of 'aging' scores: if two paths eventually lead to the same (score-determining) position, the side that is winning prefers the short path, the side that is losing the long path. (This is also true for paths without repeats.) Plain minimax does not take account of this. One way to implement this principle is to give the losing side (the side with negative score, i.e. below draw) a slight bonus for each move he does. This is a little tricky, as you will know the score only afterwards, and you must make sure to precompensate the search window so that out-of-window results cannot move back into the window by the later tinkering with the score. But it can be done, has a logical basis, and thus has many advantages in other areas (e.g. you would be liberated of that ridiculous stuff of adjusting checkmates for depth in the tree.)
With the aging, the score of the branch that breaks out of the repetition after the 2nd repeat, (which will be a branch in favor of the side that breaks out, or he would not do it), will have devaluated in being passed towards the root in favor of the loser. So when compared later with the (hashed) value of that same branch between 1st and 2nd repeat, the scores are no longer equal, but will seem inferior to the winning side. So that this side will prefer the move that breaks out immediately. The aging bonus can be as small as you want, so that it does not have any real effect on scoring, but just acts as a tie breaker between otherwise exactly equal scores.
The whole scheme still requires extra work to search and generate moves in the nodes on the repeat-cycle path twice. I am not sure if repeats occur often enough to make this have a significant impact on performance. Perhaps most of the danger for this can be eliminated by sorting moves that obviously offer the opportunity for the opponent to cause a repeat (because they reverse the move done 2 ply before, and the move 1-ply before was reversible) in very last place, and the move that actually brings about the repeat likewise (in simple back-and-forth moving cycles, both these moves satisfy that same criterion). Usually the repeat path will then be alpha-beta pruned in one of the two nodes before the repeat, unless it really might become PV.
This works, but the problem of this method is that, if there is a perpetual on the board, most other lines can take 'the long way', visiting each position twice by interjecting 4 ply from the perpetual before continuing with the true pursuit of the alternative line. (e.g. see http://www.vpittlik.org/wbforum/viewtopic.php?t=6387 .)
It makes no sense at all to have lines that visit the same position twice. The second occurence is just a metaphore for the first occurrence, you really want to redo the search of that repeated node under different conditions of draw-scoring the repeat (causing path dependence), and not hashing the intervening nodes. So I guess what is really needed, is that once the search reaches the 2nd occurrence of a position, it should extend the line such that it searches that node to the same depth as was requested for the search of the first occurrence. (This depth should thus be stored in the repetition table.)
In fact you would redo the search as soon as you discover that returning to that same position might be relevant (i.e. it was not pruned before you arrived there), although it might still happen that you find side branches of the path between 1st and 2nd occurence that after all will make it irrelevant.
So you first do the search starting from the 2nd occurrence, to the depth that was requested for the 1st occurrence, under the assumption that a 3rd return to the position is a draw, but without storing the results for the repeated nodes in the hash (to prevent contamination). This search will tell you if the repetition is forced, or if one of the sides prefers to break out of it. Whatever the case, the result will be the path-independent score of the 2nd occurrence.
If the result of the second occurrence was a draw score, you award it to the move that led to the repeat, and finish the search of the first occurrence. This for the benifit of hashing the nodes on the path between the 1st and 2nd occurrence, which was suppressed in the repeat path between 2nd and 3rd occurrence. This 'finishing of the search' is mainly a formal thing, as all side branches of the repeat cycle, in so far they were not searched already, now have been searched to the requested depth as part of the search of the 2nd occurence. And the side branches were hashed (in so far they were not repeats themselves). So there is not much extra time involved in searching the repeating position twice, each side branch is only searched once, be it before or after the repeat is encountered, and the second time it is taken from the hash. Only the nodes on the repeat path really are searched twice.
But (and now comes the crux), if it was favorable to break out of the repeat, giving the current move its true score does cause a problem. The branch that leads to this score, searched in the sub-tree from the 2nd occurrence, might not have been searched yet in the tree from the first occurrence. And if we encounter it later the score will be axactly the same, as a long line and a short line leading to the same end-leaf in minimax have the same score. And in a case where a new move produces an equal score to a move we already have, we consider it a fail low, and stick with the old move. Also in cases where the new move was the short-cut, and the old move was the detour through the 2nd repeat! On the other hand, we cannot arbitrarily adjust the score, as this will cause contamination in the underlying nodes (which are hashed).
A solution to this dilemma can be provided by the general mechanism of 'aging' scores: if two paths eventually lead to the same (score-determining) position, the side that is winning prefers the short path, the side that is losing the long path. (This is also true for paths without repeats.) Plain minimax does not take account of this. One way to implement this principle is to give the losing side (the side with negative score, i.e. below draw) a slight bonus for each move he does. This is a little tricky, as you will know the score only afterwards, and you must make sure to precompensate the search window so that out-of-window results cannot move back into the window by the later tinkering with the score. But it can be done, has a logical basis, and thus has many advantages in other areas (e.g. you would be liberated of that ridiculous stuff of adjusting checkmates for depth in the tree.)
With the aging, the score of the branch that breaks out of the repetition after the 2nd repeat, (which will be a branch in favor of the side that breaks out, or he would not do it), will have devaluated in being passed towards the root in favor of the loser. So when compared later with the (hashed) value of that same branch between 1st and 2nd repeat, the scores are no longer equal, but will seem inferior to the winning side. So that this side will prefer the move that breaks out immediately. The aging bonus can be as small as you want, so that it does not have any real effect on scoring, but just acts as a tie breaker between otherwise exactly equal scores.
The whole scheme still requires extra work to search and generate moves in the nodes on the repeat-cycle path twice. I am not sure if repeats occur often enough to make this have a significant impact on performance. Perhaps most of the danger for this can be eliminated by sorting moves that obviously offer the opportunity for the opponent to cause a repeat (because they reverse the move done 2 ply before, and the move 1-ply before was reversible) in very last place, and the move that actually brings about the repeat likewise (in simple back-and-forth moving cycles, both these moves satisfy that same criterion). Usually the repeat path will then be alpha-beta pruned in one of the two nodes before the repeat, unless it really might become PV.