I believe I have now solved the problem of hash-table pollution by repetition draws in a very clean way. The key is the 'contingency minimax' I discussed elsewhere (
http://www.volker-pittlik.name/wbforum/viewtopic.php?t=4415), which I originally intended only for my quiescence search. The idea of this is that a position at any stage of the search is not charactarized by a single score, but by a range of scores (represented by two numbers, the minimum and the maximum score this position might deliver. This score range is passed down towards the root like a normal score.
The idea is to use this algorithm to keep track of a 'shadow score', i.e. the score that a position would have had if the three-repetitions-draw rule would not apply to positions in the tree. Thus, if a position already occured 2 times in the game history, a hit in the tree simply delivers a draw score for both the real score and the shadow score, because this is a guaranteed draw. Positions that occur zero or one time in the game history, and once earlier in the tree, count as a draw for the real score when they are encountered again, but not for the shadow score.
The shadow score will in general be a range rather than a single value, because there are moves in the search tree that have not been searched or evaluated. In particular the move that led to the repetition, which for the real score was declared a draw without search, but for the shadow score would represent a total uncertainty (-inf, inf). But there might also be moves skipped due to beta cut-off on the real score. This would reset the upper bound on the shadow score range to +inf, because the skipped moves might be arbitrarily good. Like the real score on a fail high the the range would then only give a lower bound. The difference with the real score is that in such a case the lower bound needs not necessarily be above beta.
When the positions for which the real score differs from the shadow score are stored in the hash table, we store the shadow score. In a hash table that can store both a lower and upper bound for the same position, (such as recommended for an MTD(f) search), this would go quite naturally. If the hash table stores only a single bound, when the upper and lower bound differ, we must make a choice which of these seems the more useful to store.
Together with the real and shadow score, we also return the ply number of the position that was responsible for the repetition draw. The assumption is that this is usually only one position: repetition draws tend not to propagate towards the root very far, since one of the opponents will not be happy with it and will try to avoid it. If a real score is based on several repeated positions, we use the lowest ply number. As soon as the tree walk returns to the level on which the shadow score was based, the real score is no longer dependent on any tree history between the root and this node. The shadow score can then be reset to the real score. This way a node from which a repetition can be forced (e.g. eternal check) will be stored in the hash table with the draw score, and also all nodes between the root and the node from which the draw can be forced can benifit from the draw score.
To narrow the range of the shadow score, it might be beneficial to update the hash-table entry for a certain position after every move that is searched from this position that improved the shadow score. That way, when such a node is encountered later in the tree, the shadow score is always optimally defined, and this might improve the definition of hash-table scores for all positions in the subtree behind the first encounter of the repeated state.
The algorithm thus seems to be as follows (untested!):
- Code: Select all
#define LOWER shadow[0]
#define UPPER shadow[1]
#define REP_DEPTH shadow[2]
int search(int side, int alpha, int beta, int shadow[], int depth)
{
int best_real_score, best_opt_score, best_pes_score, range[3];
int draw_depth;
if(HASH_ENTRY->rep_cnt == 2)
{ /* twice in game, return unconditional draw */
REP_DEPTH = INF;
LOWER = DRAW_SCORE;
UPPER = DRAW_SCORE;
return DRAW_SCORE;
}
if(HASH_ENTRY->rep_cnt > 2)
{ /* second time in tree, return conditional draw */
REP_DEPTH = HASH_ENTRY->rep_depth;
LOWER = HASH_ENTRY->lower_bound;
UPPER = HASH_ENTRY->upper_bound;
return DRAW_SCORE;
}
/* first time in tree, check for normal hash hit */
/* based on stored lower/upper bounds */
....
/* first time in tree, normal search */
HASH_ENTRY->rep_cnt += 3;
HASH_ENTRY->rep_depth = depth;
HASH_ENTRY->lower_bound = -INF;
HASH_ENTRY->upper_bound = +INF;
best_pes_score =
best_opt_score =
best_real_score = -INF;
draw_depth = +INF;
FOR_ALL_MOVES;
{ /* usual negamax stuff */
DO_MOVE;
score = -search(-color, -beta, -alpha, range, depth+1);
UNDO_MOVE;
/* if score not based on less deep nodes, make it unconditional */
if(range[2]>=depth)
range[0] = range[1] = -score;
/* keep track of tentative score range for this position */
best_pes_score = max(best_pes_score, -range[1]);
best_opt_score = max(best_opt_score, -range[0]);
/* keep track of repeated nodes score is based on */
draw_depth = min(draw_depth, range[2]);
if(score>best_real_score)
{
best_real_score = score;
/* draw_depth = min(draw_depth, range[2]); moved */
if(score>alpha) alpha = score;
if(alpha>=beta && MOVES_LEFT)
{ best_opt_score = + INF; break; /* beta cutoff */ }
}
/* keep hash-table bounds up to date as early as possible */
HASH_ENTRY->lower_bound = best_pes_score;
}
/* store the shadow score in hash table */
HASH_ENTRY->rep_cnt -= 3;
HASH_ENTRY->lower_bound = best_pes_score;
HASH_ENTRY->upper_bound = best_opt_score;
HASH_ENTRY->depth = depth;
/* return real and shadow scores for negamax */
REP_DEPTH = draw_depth;
LOWER = best_pes_score;
UPPER = best_opt_score;
return best_real_score;
}
Edit: On second thought I moved the updating of the 'draw_depth', i.e. the depth beyond which the real scores might be affected by history, to a place where it is done on any move, rather than just on the best so far.
The idea is that even moves with a history-affected score that were not selected might have an impact on the current score: without the repetition draw they might have been much better, in which case the would have been selected, so the repetition draw in that case still affects the score.
That this means that a history-dependence will always be propagated towards the root upto the node on which it was based, even though in practice the real score might be no longer dependend on it, is not as big as it seems: in such cases the shadow score range closes in on the real score. The draw_depth variable only acts as a kind of 'time-out', it maks sure that any history depencence is erased as soon as we backup through the tree far enough that this 'history' iin fact becomes the future, and is used for nothing else. The factual dependence on the history-dependent draw score of the repeat node might in the mean time erase it self much earlier.