parallel prefix mates (for Gerd?)

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

parallel prefix mates (for Gerd?)

Postby H.G.Muller » 17 Feb 2007, 13:17

I am currently working to speed up my EGTB builder. One of the things it spends much time on, is to seed the EGTB with checkmates. Usually the winning side covers 20-30% of the board with its pieces (e.g. in KRK 8 King moves + 14 Rook moves = 22 out of 64 squares), so 20-30% of the possible positions have black in check. The first step is then to check all possible positions to see if they have at least one move that leads to the remaining 70-80%. Those that have not, are then the check- or stalemates.

Because all positions have to be checked, this takes a lot of time. So I am looking for a simple way to 'filter out' positions that can't possibly be mate because they have a King move to a safe square. So I am looking for an algorithm to identify all squares than border on all sides (in 8-neighbor topology) on covered squares (or squares that are occupied by own pieces). The square itself does not have to be covered, for I want to find stalemate candidates too.

The EGTB builder runs through all positions with all pieces on the board except black King, and from each such partial position it then generates 64 complete positions by adding the black King on any of the 64 squares. (Some of these are impossible, because the squares are already occupied, but there is spaced reserved for those in the TB anyway, to make the mapping from board position to TB index trivial.)

Rather than going through all these 64 positions, running the move generator for black to see if he has moves to safe squares, I would like to use some parallel algorithm to decide this (for King-moves only) for all 64-positions simultaneously. I could for instance make a bitboard with all white attacks from the partial position, orred with all black (non-King) pieces, thus marking all the squares where a black King could not go to. Now from that bitboard c, I would make the squares where a King is 'trapped'.

What is the quickest way to compute this? I can think off
Code: Select all
a = c>>1 & ~0x8080808080808080LL | c<<1 & 0x0101010101010101LL;
b = a | c;
c = ~(a | b>>8 | b<<8);

That way I would do it with 4 OR operations, rather than the expected 8. Can it be done with less?
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL

Re: parallel prefix mates (for Gerd?)

Postby Gerd Isenberg » 17 Feb 2007, 20:16

H.G.Muller wrote:What is the quickest way to compute this? I can think off
Code: Select all
a = c>>1 & ~0x8080808080808080LL | c<<1 & ~0x0101010101010101LL;
b = a | c;
c = ~(a | b>>8 | b<<8);

That way I would do it with 4 OR operations, rather than the expected 8. Can it be done with less?


Your code looks fine. In 64-bit mode it might be an slight code size improvement to load only one 64-bit constant and to perform one pre-shift mask and one post-shift mask.

Code: Select all
a = c>>1 & ~0x8080808080808080LL | (c & ~0x8080808080808080LL)<<1;
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: parallel prefix mates (for Gerd?)

Postby H.G.Muller » 17 Feb 2007, 21:26

It dawns on me that I could use the same principle on a byte board. And the 64 TB entries corresponding to the 64 positions that only differ in the placement of the black King could be considered an 8x8 board.

So I could operate on the TB entries themselves, especially on a 64-bit machine, where I could access an (aligned) group of 8 entries that form a rank with a single instruction:
Code: Select all
union EGTB_block
{   char b[64];
    long long int rank[8];
} EGTB[SIZE]; /* aligned to the cache line */

If the lsb would contain the mark indicating a black King is in check in this position I could make one pass through the block (8 iterations) doing
Code: Select all
rank[i] |= (rank[i+1] & rank[i-1] & 0x0101010101010101LL)<<1;

(properly padding the board edges) to mark in the '2' bit of every square if the square in front of it and the square behind it are both in check. Because of the parallelism of doing 8 squares simultaneously this is very cheap.

Then in a second pass I can go through it square by square, but to test if the ring of 8 around such as quare is completely checked, I now only have to look to the '2' bit of the square itself (to catch forward and backward), and test the '2' and '1' bit (which can be done in a single instruction) of the left and right neighbor. So only 3 tests for each square, in stead of 8:
Code: Select all
if((b[i-1]&b[i+1]&3) == 3 && (b[i]&2) == 2) /* King trapped */
else /* King can withdraw */

To really minimize mispredictions, while still not doing too much bitwise logic in the typical case, the best compromise might be:
Code: Select all
if((b[i]&b[i-1]&2) && (b[i-1]&b[i+1]&3)==3) ...

Then the initial test is just two AND instructions and a JEQ, and that would test effectively 4 squares, that would almost never be all checked, as they are not on a single ray. Only when all these squares are checked, a misptredict would occur and we would fall on the second line of defense, which is by testing the square immediately left, and the three on the file right of us, with two more and instructions, a compare and a branch.

(It sounds great that you only have to do 3 individual tests in stead of 8, but we should remember that we can stop at the first test that fails (i.e. if the neighbor square is not in check), and that most tests (70-80%) will fail. So the main savings comes from reducing the number of mispredictions by strongly increasing the probability that the test fails, by testing several squares at once, but not doing too much work to concentrate the information we apply the test to, by collecting it largely in parallel.)

Only in the very rare case that the King is totally trapped this second branch mispredicts as well, and we get to the code in the if statement. There we would have to start running the move generator for the other black pieces to see if they have legal moves. How exactly you do this has probably no impact on performance, as it is almost never done. Mates are really rare, usually just a few hundred in a tablebase of a few million entries.
User avatar
H.G.Muller
 
Posts: 3453
Joined: 16 Nov 2005, 12:02
Location: Diemen, NL


Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 41 guests