Sub-optimal move ordering gives smaller tree
Posted: 18 Jan 2006, 11:09
Usually one assumes that, with alpha-beta pruning, trying the best move first will always produce the smallest tree. That this is not universally true even for fixed-depth trees is shown in the example below:
From the root position R there are 4 moves, one (B) that simplifies the situation so that only a low branching factor remains (here: 1), while the other three moves keep the situation complex. As long as the situation is complex there is a high branching factor (here: 4), but there is always one move among those that simplifies the situation (from where-on the branching factor drops to 1 again).
Case 1 sketches the full tree, with evaluation written at the leaves. Move A (i.e. to position A) is the best in ply 1, emerging with a score of +1 from the complex situation. The complex situation can also lead to a score -3, though, if I enter it with the bad moves C or D. The simplifying move B is not the best, it leads to score 0. In reply to C and D the simplifying moves for black on ply 2 (i and m) are also not the best for him, they lead to a score 0 in stead of -3.
Case 2 shows how alpha-beta prunes the tree with optimal move ordering, i.e. A is searched before B on ply 1, and f and j are searched before i and m (respctively) on ply 2. From the 40 end leaves, only 16 remain.
Case 3 shows alpha-beta with 'sub-optimal' move ordering. In stead of searching the best move first, we systematically search the simplifying move first, although this move is worse. Only 10 end leaves remain now. The savings come from the beta nodes in ply 2, where the fail high is effected cheaply by the simple move, which, although it was not the best, still produced a score large enough to cause the fail. Sorting B in front on ply 1 did not hurt, despite the fact that it was now searched with a wider window: the tree was to small for this making a significant difference (here there was in fact no difference at all).
In fact sorting B in front on ply 1 has a very favorable side effect: because searchin B ups alpha, the move after that will be searched with this new alpha rather than with alpha = -inf. If the move searched directly after it is the best one, this does not make any difference (and after that this nw move determines alpha, so any possible effect of searchin B is expired). But if we occasionally making a mistake in judging beforehand in the complex situation what the best move is, we might inadvertantly sort C in front of A. If C was the first, then, its search would be just as expensive as that of the A subtree (no cutoffs). But if C is searched after B, it is just as cheap as when it was searched after A: there is already a reasonable value for alpha, causing the same cutoffs in ply 3. So sorting the simple move in front in a PV node does not cost much, and acts as an insurance to sorting errors for the complex moves.
So in contrast to popular believe, 'sub-optimal' move ordering can in some cases beat 'optimal' move ordering as far as tree size is concerned, even for fixed-depth trees. Sorting by complexity might beat sorting by expected score.
- Code: Select all
Case 1: Full tree (40 leaves)
----a=== (+1,+1,+1,+1) (=== means a bundle of 4 moves to positions a1,a2,a3,a4)
|---b=== (+1,+1,+1,+1)
|---c=== (+1,+1,+1,+1)
----A---d--- +1
|---B---e--- 0
| /--f=== (-3,-3,-3,-3)
R---C---g=== (-3,-3,-3,-3)
| \--h=== (-3,-3,-3,-3)
| --i--- 0
----D---j=== (-3,-3,-3,-3)
|---k=== (-3,-3,-3,-3)
|---l=== (-3,-3,-3,-3)
----m--- 0
Case 2: Alpha-Beta with best move first (16 leaves)
----a=== (+1,+1,+1,+1)
|---b--- +1
|---c--- +1
----A---d--- +1
|---B---e--- 0
| /--f=== (-3,-3,-3,-3)
R---C-x
| \x
| x
----D---j=== (-3,-3,-3,-3)
|-x
|-x
--x
Case 3: Alpha-Beta with simplest move first (10 leaves)
----B---e--- 0
| ----a=== (+1,+1,+1,+1)
| |---b--- +1
| |---c--- +1
|---A---d--- +1
| /--i--- 0
R---C-x
| \x
| x
----D---m--- 0
|-x
|-x
--x
From the root position R there are 4 moves, one (B) that simplifies the situation so that only a low branching factor remains (here: 1), while the other three moves keep the situation complex. As long as the situation is complex there is a high branching factor (here: 4), but there is always one move among those that simplifies the situation (from where-on the branching factor drops to 1 again).
Case 1 sketches the full tree, with evaluation written at the leaves. Move A (i.e. to position A) is the best in ply 1, emerging with a score of +1 from the complex situation. The complex situation can also lead to a score -3, though, if I enter it with the bad moves C or D. The simplifying move B is not the best, it leads to score 0. In reply to C and D the simplifying moves for black on ply 2 (i and m) are also not the best for him, they lead to a score 0 in stead of -3.
Case 2 shows how alpha-beta prunes the tree with optimal move ordering, i.e. A is searched before B on ply 1, and f and j are searched before i and m (respctively) on ply 2. From the 40 end leaves, only 16 remain.
Case 3 shows alpha-beta with 'sub-optimal' move ordering. In stead of searching the best move first, we systematically search the simplifying move first, although this move is worse. Only 10 end leaves remain now. The savings come from the beta nodes in ply 2, where the fail high is effected cheaply by the simple move, which, although it was not the best, still produced a score large enough to cause the fail. Sorting B in front on ply 1 did not hurt, despite the fact that it was now searched with a wider window: the tree was to small for this making a significant difference (here there was in fact no difference at all).
In fact sorting B in front on ply 1 has a very favorable side effect: because searchin B ups alpha, the move after that will be searched with this new alpha rather than with alpha = -inf. If the move searched directly after it is the best one, this does not make any difference (and after that this nw move determines alpha, so any possible effect of searchin B is expired). But if we occasionally making a mistake in judging beforehand in the complex situation what the best move is, we might inadvertantly sort C in front of A. If C was the first, then, its search would be just as expensive as that of the A subtree (no cutoffs). But if C is searched after B, it is just as cheap as when it was searched after A: there is already a reasonable value for alpha, causing the same cutoffs in ply 3. So sorting the simple move in front in a PV node does not cost much, and acts as an insurance to sorting errors for the complex moves.
So in contrast to popular believe, 'sub-optimal' move ordering can in some cases beat 'optimal' move ordering as far as tree size is concerned, even for fixed-depth trees. Sorting by complexity might beat sorting by expected score.