Winboard Forum

by **Michael Sherwin** » 14 Sep 2006, 10:36

I am starting a new topic, because, I should not have intruded into the Compact Bitboards thread with this new idea.

The basic idea as I have come to understand it is this:

Code: Select all: blockers.union = queenBits[sq]; signature = row1hash[blockers.row1] ^ row2hash[blockers.row2] ^ row3hash[blockers.row3] ^ row4hash[blockers.row4] ^ row5hash[blockers.row5] ^ row6hash[blockers.row6] ^ row7hash[blockers.row7] ^ row8hash[blockers.row8]; key = signature & MASK; if(hashTbl[key].signature == signature) { moveBits = hashTbl[key].bits ^ friendlies; } else { // generate using classic non rotated bitboards // or Gerds compact 1.5kb bitboards // get mobility // and store both in the hash for future use }

The idea behind this is that it uses mostly 32bit values, has no shifts or multiply instructions, smaller dependency chains, one lookup to get all bits (even for queens), only one dimentional indexing and requires a relativly small hash table.

The downside of not finding the correct bitboard in the hash is minimized by the fact that it should not happen that often. This is because, most blocking combinations are not possible at any given time. As the game progresses different blocking combinations then become possible and old ones become not possible and the changes to the hash is gradual. In the end game the change should become negligible or non existant.

An improvement would be that if a perfect (no collisions) hash of reasonable size can be generated then the above given downside disapears. No signature is needed and the end result would be even faster!

An alternative method would be:

Code: Select all: moveBits = bishopBits[sq]; blockers.u = moveBits & allPieces & noEdge; moveBits &= possibleBits2[sq][blockers.row2]; moveBits &= possibleBits3[sq][blockers.row3]; moveBits &= possibleBits4[sq][blockers.row4]; moveBits &= possibleBits5[sq][blockers.row5]; moveBits &= possibleBits6[sq][blockers.row6]; moveBits &= possibleBits7[sq][blockers.row7]; moveBits ^= friendlies;

No hash table needed. Some advantages gone. May be quite all right for 64bit machines. Can be modified to just get the queen bits and then modify the result for the rook or bishop (Zach Wegner).

Well thats it as the idea stands now. I am going to make some test and then report back.

Mike

Edit: I added the row numbers at the end of each possibleBits array (also per Zach Wegner). Also Zach said that the hash method could just return the bits for the queen and then modified for rook or bishop as well.

by **Ron Murawski** » 14 Sep 2006, 22:13

Hi Michael,

The first time you posted to the long Compact Bitboards thread your idea sailed right over my head. Now that I understand the concept I think your idea has tremendous potential. Great idea!

How large is "a relativly small hash table "?

Table size vs collision rate will be the most interesting test statistic.

Best regards,
Ron

by **Michael Sherwin** » 15 Sep 2006, 00:09

Ron Murawski wrote:Hi Michael,

The first time you posted to the long Compact Bitboards thread your idea sailed right over my head. Now that I understand the concept I think your idea has tremendous potential. Great idea!

How large is "a relativly small hash table "?

Table size vs collision rate will be the most interesting test statistic.

Best regards,
Ron

Hi Ron,

Thanks for the vote of confidence! The size of the hash table is dependant on wether rook and bishop are done seperatly and the queen is derived from both or if the queen decides the size and the rook and bishop are the ones derived. For rook I am thinking 65536 entries would be enough and for the bishop 8192 entries. For a queen table I am hoping that 256kb entries would be okay, even if not optimal. Memory is very cheap these days, so a larger table for the queen may not be a high price to pay. Also the table may use 2 bytes to point to the bitboard rather than storing it thus reducing the table size to 25%. A big advantage is that the mobility can also be computed and stored, maybe packed into the same two bytes.

Best
Mike

by **Zach Wegner** » 15 Sep 2006, 00:15

Hello all (or both

),

Michael Sherwin wrote:Also Zach said that the hash method could just return the bits for the queen and then modified for rook or bishop as well.

Now I'm not as sure about this. The hash method could work with queens, but it would increase the size of the hash greatly. It might be better with the hash method to separate it into rooks and bishops. Or, another thought, what if we use a sort of magic-multiply with masked occupied board, but instead of unique indices, we use a hash lookup. Only rarely would there be a collision, and then we use some other method, preferably with little or no tables. This might be able to reduce the size of magic arrays significantly, and if the tables are well designed, collisions could be even more unlikely. The only problem would be the addition of a branch...

It seems the & method is simpler and likely faster for 64 bits. I have implemented it into my program, but it is slower than my old approach; the NPS dropped from 312k to 289k (just on the starting position). This method could be competitive with other methods on 64 bits. Just to make it easy to test for others, I will post the new code for it, modified for non-0x88 coordinates. The initialization is particularly easy.

Code: Select all: BITBOARD attack_mask[64][8][256]; inline BITBOARD attacks(SQUARE square, BITBOARD moves) { SQ_RANK r; union { BITBOARD bitboard; unsigned char rank[8]; } b; b.bitboard = board.occupied_bb; for (r = RANK_1; r <= RANK_8; r++) moves &= attack_mask[square][r][b.rank[r]];/*For big endian architectures, replace one r with 7-r.*/ return moves; } void initialize_attacks(void) { BITBOARD mask; BITBOARD submask; SQ_RANK rank; SQUARE square; for (square = 0; square < OFF_BOARD; square++) { for (rank = RANK_1; rank <= RANK_8; rank++) { for (submask = 0; submask < 256; submask++) { mask = ~(submask << rank * 8);/* Act as if the bits in submask are the only occupied squares on the board */ attack_mask[square][rank][submask] =/* Kogge-Stone fills, or you can use some other queen_attacks function */ fill_up(MASK(square), mask) | fill_up_right(MASK(square), mask) | fill_right(MASK(square), mask) | fill_down_right(MASK(square), mask) | fill_down(MASK(square), mask) | fill_down_left(MASK(square), mask) | fill_left(MASK(square), mask) | fill_up_left(MASK(square), mask); } } } }

You can use attacks() as is with passing in piece_moves_bb[piece][sq] or create specific functions for each piece. Specific functions also allow you to only use the inner 6 rows for bishops.

Zach

by **Michael Sherwin** » 15 Sep 2006, 00:41

Just to make it easy to test for others, I will post the new code for it, modified for non-0x88 coordinates.

Thanks Zach,

You have saved me a lot of work! I assume that your test was on a 32bit machine? Do you also plan on testing the hash method? 8-)

Oh well, I suppose that I should do some work on this as it is my idea. :mrgreen:

Just be aware that what will take you a couple of hours to try, will take me a week or more.

Best,
Mike

by **Zach Wegner** » 15 Sep 2006, 02:58

Michael Sherwin wrote:Thanks Zach,

You have saved me a lot of work! I assume that your test was on a 32bit machine? Do you also plan on testing the hash method? Oh well, I suppose that I should do some work on this as it is my idea. Just be aware that what will take you a couple of hours to try, will take me a week or more.

Best,
Mike

Yes, it was on 32 bits-a G4-which is also a big-endian architecture, so thats two things against it there.

Regarding the hash, I guess you might want to get started (if it would really take you that much longer, which I doubt), because that idea seems pretty hard to implement, and would probably take me a week. I might play with it, but I don't even know where to start, and it seems that you already have some code for this. Plus I have a backlog of many things to do in evaluation and search, that matter more than a few percent

. Feel free to post any code you have for this hash method to let the community play with it. There have been a lot of new developments in bitboards recently, and most of it has been from the exchange of ideas here and on CCC.

Also, some more musings-We could hash the occupied sets for each color, and then have the hash table pop out a full movelist. For this we might want to break it down by direction, or even all 8 positive/negative directions.

Regards,
Zach

by **Michael Sherwin** » 15 Sep 2006, 07:45

Zach Wegner wrote:
Michael Sherwin wrote:Thanks Zach,

You have saved me a lot of work! I assume that your test was on a 32bit machine? Do you also plan on testing the hash method? Oh well, I suppose that I should do some work on this as it is my idea. Just be aware that what will take you a couple of hours to try, will take me a week or more.

Best,
Mike

Yes, it was on 32 bits-a G4-which is also a big-endian architecture, so thats two things against it there.

Regarding the hash, I guess you might want to get started (if it would really take you that much longer, which I doubt), because that idea seems pretty hard to implement, and would probably take me a week. I might play with it, but I don't even know where to start, and it seems that you already have some code for this. Plus I have a backlog of many things to do in evaluation and search, that matter more than a few percent . Feel free to post any code you have for this hash method to let the community play with it. There have been a lot of new developments in bitboards recently, and most of it has been from the exchange of ideas here and on CCC.

Also, some more musings-We could hash the occupied sets for each color, and then have the hash table pop out a full movelist. For this we might want to break it down by direction, or even all 8 positive/negative directions.

Regards,
Zach

I will do my best. However, if it would really take you a week to do then count on months for me. A case in point would be tonights exercise. I tried to create a perfect hash just for the bishops and so far have failed miserably. I am having just too much trouble creating all valid blocking sets for the bishop on all the squares of the board. The more that I try to work out the bugs the more confused I become. After I figure out how to generate all the blocking sets then the rest would be simple. The only question then would be, how close to minimal could the hash be. There are people like Gerd, H.G., Bob and many others that could do this exercise in an hour! I am not one of these people. If someone with their talent will not decide to champion this cause just to find out if it works or not and how well, and I am not begging anyone to do so, then maybe sometime in 2008 I will have the answers. But thanks for your encouragement!

Edit: I guess that I sound a little pathetic - not to worry, I am usually more down on myself than this!

Best
Mike

by **Zach Wegner** » 15 Sep 2006, 09:04

Michael Sherwin wrote: I am having just too much trouble creating all valid blocking sets for the bishop on all the squares of the board. The more that I try to work out the bugs the more confused I become. After I figure out how to generate all the blocking sets then the rest would be simple.

Using the "subset" trick that was posted by Steffan Westcott on the old CCC, http://www.stmintz.com/ccc/index.php?id=490129 might work for your blockers routine

The only question then would be, how close to minimal could the hash be.

You know, now I think it would be quite possible to generate a perfect hash using this method, if you also index the hash table by square. If you know the square, you know which bits in the occupied index you are looking at, and you can number these. Then you can simply shift them into the appropriate position in the hashkey. To use an example for a bishop on E4:

Code: Select all: Numbered indices: 9-bit hashkey for each rank 0 0 0 0 0 0 0 0 000000000 ^ 0 9 0 0 0 0 0 0 900000000 ^ 0 0 7 0 0 0 8 0 087000000 ^ 0 0 0 5 0 6 0 0 000650000 ^ 0 0 0 0 x 0 0 0 000000000 ^ 0 0 0 3 0 4 0 0 000004300 ^ 0 0 1 0 0 0 2 0 000000021 ^ 0 0 0 0 0 0 0 0 000000000 = 987654321

There is also probably room for compression by packing the square into these, but I'm not sure it could go below 12 bits total=4096*8=32k for bishops. Using compression (though this does not affect runtime speed) we can get rooks into at most 13 bits=64k, and queens in at most 18 bits=2048k. These also do not include the 1024k row-lookup tables. I'm still not sure how effective this method would be compared to the &ing method or other bitboard techniques, but it should be possible to test.

Regards,
Zach

btw, these things become easier with time. I remember when I started with bitboards, I had to ask about how to find which piece was attacking a square. I didn't realize you had to bitscan the piece bitboard for "from", and then create a move bitboard for that piece, then bitscan to find each "to". I had my move generation returning 6 bitboards, the squares attacked for each piece! Note also that I didn't know about Steffen's/Gerd's bitscanless fill methods

.

by **Michael Sherwin** » 15 Sep 2006, 17:02

Zach Wegner wrote:
Michael Sherwin wrote: I am having just too much trouble creating all valid blocking sets for the bishop on all the squares of the board. The more that I try to work out the bugs the more confused I become. After I figure out how to generate all the blocking sets then the rest would be simple.

Using the "subset" trick that was posted by Steffan Westcott on the old CCC, http://www.stmintz.com/ccc/index.php?id=490129 might work for your blockers routine
The only question then would be, how close to minimal could the hash be.

You know, now I think it would be quite possible to generate a perfect hash using this method, if you also index the hash table by square. If you know the square, you know which bits in the occupied index you are looking at, and you can number these. Then you can simply shift them into the appropriate position in the hashkey. To use an example for a bishop on E4:
Code: Select all
Numbered indices: 9-bit hashkey for each rank 0 0 0 0 0 0 0 0 000000000 ^ 0 9 0 0 0 0 0 0 900000000 ^ 0 0 7 0 0 0 8 0 087000000 ^ 0 0 0 5 0 6 0 0 000650000 ^ 0 0 0 0 x 0 0 0 000000000 ^ 0 0 0 3 0 4 0 0 000004300 ^ 0 0 1 0 0 0 2 0 000000021 ^ 0 0 0 0 0 0 0 0 000000000 = 987654321

There is also probably room for compression by packing the square into these, but I'm not sure it could go below 12 bits total=4096*8=32k for bishops. Using compression (though this does not affect runtime speed) we can get rooks into at most 13 bits=64k, and queens in at most 18 bits=2048k. These also do not include the 1024k row-lookup tables. I'm still not sure how effective this method would be compared to the &ing method or other bitboard techniques, but it should be possible to test.

Regards,
Zach

btw, these things become easier with time. I remember when I started with bitboards, I had to ask about how to find which piece was attacking a square. I didn't realize you had to bitscan the piece bitboard for "from", and then create a move bitboard for that piece, then bitscan to find each "to". I had my move generation returning 6 bitboards, the squares attacked for each piece! Note also that I didn't know about Steffen's/Gerd's bitscanless fill methods .

Thanks Zach,

Clues are helpful! So it is possible to decide where the hash entries should be for each square and then work backwards to determine a set of hash codes that when combined will point to those entries. Deterministic, rather than random. So in this case the 'chicken is really before the egg'. I am out of time untill next week, so I will start on this, next week.

Best,
Mike

by **Michael Sherwin** » 16 Sep 2006, 18:51

I have not had the time to set down with paper and pencil to work anything out yet. However, while I was cutting the grass, the answer just poped into my head for how to do a minimal perfect hash for the bishop and rook. The queen too, but the table would be much too large and is therefore not practicle. In reality, a minimal perfect hash is not really a hash at all. It is a method for building up an index from perfectly known data that can be used to access a table of known perfect size. For the bishop that means a table of 5248 entries. No more and no less. An array bishopBase[64] has each entry set to point to successive subarrays within the main table just big enough to hold the number of entries needed for a bishop on that square. Then each bishopRow?[128] array will contain the values needed to build up the subarray index that when added to the base index will point to a specific entry in the bishop lookup table.

by **Michael Sherwin** » 19 Sep 2006, 09:48

Michael Sherwin wrote:I have not had the time to set down with paper and pencil to work anything out yet. However, while I was cutting the grass, the answer just poped into my head for how to do a minimal perfect hash for the bishop and rook. The queen too, but the table would be much too large and is therefore not practicle. In reality, a minimal perfect hash is not really a hash at all. It is a method for building up an index from perfectly known data that can be used to access a table of known perfect size. For the bishop that means a table of 5248 entries. No more and no less. An array bishopBase[64] has each entry set to point to successive subarrays within the main table just big enough to hold the number of entries needed for a bishop on that square. Then each bishopRow?[128] array will contain the values needed to build up the subarray index that when added to the base index will point to a specific entry in the bishop lookup table.

Here is the code for the bishop lookup table. To achive an exact size non hash lookup table two compromises are required.Six rows for each square is required. Adding an offset for each row. Considering these compromises, it is not clear wich is more efficient, a hash lookup with a signature or this exact table lookup.

EDIT: This code block has been corrected 9/20/06

Code: Select all: u32 index; u32 bishopSubTbls[64]; u32 bishopRows[64][6][128]; u32 *bRows; blocks blockers; u64 bishopMoveTbl[5248]; bRows = &bishopRows[sq][0][0]; blockers = bishopBits[sq] & occupied; index = bishopSubTbls[sq] + (bRows[blockers.row2] | (bRows + 128)[blockers.row3] | (bRows + 256][blockers.row4] | (bRows + 384)[blockers.row5] | (bRows + 512)[blockers.row6] | (bRows + 640)[blockers.row7]); moveBits = bishopMoveTbl[index] ^ friendlies;

Since every square has its set of row values the six row lookups simply map any blockers to specific bits that when ored (or added)together gives an offset in the subtable pointed at by the bishop sub table array.

Please, I would appriciate some constructive comments.

Thanks,
Mike

by **Harald Lüßen** » 19 Sep 2006, 20:27

Michael Sherwin wrote:
Code: Select all
u32 index; u32 bishopSubTbls[64]; u32 bishopRows[64][6][128]; u32 *bRows; u64 blockers; u64 bishopMoveTbl[5248]; bRows = &bishopRows[sq][0][0]; blockers = bishopBits[sq]; index = bishopSubTbls[sq] | bRows[blockers.row2] | (bRows + 128)[blockers.row3] | (bRows + 256][blockers.row4] | (bRows + 384)[blockers.row5] | (bRows + 512)[blockers.row6] | (bRows + 640)[blockers.row7]; moveBits = bishopMoveTbl[index] ^ friendlies;

Since every square has its set of row values the six row lookups simply map any blockers to specific bits that when ored (or added)together gives an offset in the subtable pointed at by the bishop sub table array.

Please, I would appriciate some constructive comments.

Thanks,
Mike

Let me see if I understand it. Sorry I rewrote your code for more clarity (for me).

Code: Select all: // The big X mask for all moves from sq, but only inner 6x6 area u64 bishopBits[64]; // The remaining blocking pieces in the X-rays u64 blockers; // The big Table with index-bits for all rows, // bishopRows[sq][row2to6][bitsOnRow] // Note that in my implementation I need half the memory. u32 bishopRows[64][6][64]; // pointer to bishopRows u32 *bRows; // From index to move bits u32 index; u64 bishopMoveTbl[5248]; blockers = allPieces; blockers &= bishopBits[sq]; blockers >>= 9; bRows = &bishopRows[sq][0][0]; index = (bRows + 0)[(blockers >> 0) & 0x3f] // row 2 | (bRows + 64)[(blockers >> 8) & 0x3f] // row 3 | (bRows + 128][(blockers >> 16) & 0x3f] // row 4 | (bRows + 192)[(blockers >> 24) & 0x3f] // row 5 | (bRows + 256)[(blockers >> 32) & 0x3f] // row 6 | (bRows + 320)[(blockers >> 40) & 0x3f]; // row 7 index += bishopSubTbls[sq]; // +, not | moveBits = bishopMoveTbl[index] ^ friendlies;

Where the typical 1-bits of a bishop on d4 are shown below,
with - indicating a don't care bit:

Code: Select all: . . . . . . . - - . . . . . 1 . giving 1 bit or 2 different values from 64-bit row 7 . 1 . . . 1 . . giving 2 bits or 4 different values from 64-bit row 6 . . 1 . 1 . . . giving 2 bits or 4 different values from 64-bit row 5 . . . B . . . . giving 0 bits or 0 different values from 64-bit row 4 . . 1 . 1 . . . giving 2 bits or 4 different values from 64-bit row 3 . 1 . . . 1 . . giving 2 bits or 4 different values from 64-bit row 2 - . . . . . - .

Where the values are just a mapping of the relevant bits to a compact value.
For bits named a to i the value for Bd4 will be:

Code: Select all: . . . . . . . - - . . . . . i . giving 1 bit or 2 different values from 64-bit row 7 . g . . . h . . giving 2 bits or 4 different values from 64-bit row 6 . . e . f . . . giving 2 bits or 4 different values from 64-bit row 5 . . . B . . . . giving 0 bits or 0 different values from 64-bit row 4 . . c . d . . . giving 2 bits or 4 different values from 64-bit row 3 . a . . . b . . giving 2 bits or 4 different values from 64-bit row 2 - . . . . . - .

-> 0x0000000i hgfedcba as index. With 2^9 = 512 index values.

There are 4 squares with 9 bits like Bd4. Other squares need another amount of bits.

Code: Select all: 6 5 5 5 5 5 5 6 5 5 5 5 5 5 5 5 5 5 7 7 7 7 5 5 5 5 7 9 9 7 5 5 5 5 7 9 9 7 5 5 5 5 7 7 7 7 5 5 5 5 5 5 5 5 5 5 6 5 5 5 5 5 5 6 4 squares with 9 bits for 512 indices are 2048 indices. 12 squares with 7 bits for 128 indices are 1536 indices. 44 squares with 5 bits for 32 indices are 1408 indices. 4 squares with 6 bits for 64 indices are 256 indices. Total 5248 indices.

This is the table where the bishopSubTbls[sq] is made for.

Looks good. Can you please show us the functions for the initialising of the arrays.

And now go for rooks. :-)

Harald

by **Michael Sherwin** » 19 Sep 2006, 23:04

This is the table where the bishopSubTbls[sq] is made for.

Looks good. Can you please show us the functions for the initialising of the arrays.

And now go for rooks.

Harald

Hi Harold,

Wow, a fantastic recodeing. Thanks!

The rooks are going to require a lot more memory and will not benifit from the extra shift. So that is a total of 6 extra 64 bit shifts and on a 32 bit machine would not IMO be as fast as using 8 bit indexs from a blockers union. However, you did say that the rewrite was just for clarity. I suppose that once it is up and running then we can look for the fastest code. So I will code the rooks in the style that you presented. It is very easy to understand.

The initialization of the rows data is going to be very difficult for me to do! So please expect this to take some time.

Thanks,
Mike

by **Zach Wegner** » 20 Sep 2006, 00:28

Michael Sherwin wrote:Hi Harold,

Wow, a fantastic recodeing. Thanks! The rooks are going to require a lot more memory and will not benifit from the extra shift. So that is a total of 6 extra 64 bit shifts and on a 32 bit machine would not IMO be as fast as using 8 bit indexs from a blockers union. However, you did say that the rewrite was just for clarity. I suppose that once it is up and running then we can look for the fastest code. So I will code the rooks in the style that you presented. It is very easy to understand.

The initialization of the rows data is going to be very difficult for me to do! So please expect this to take some time.

Thanks,
Mike

I've been working on compression with this method, and I can only say that it's a bitch. The initialization code is very large, and I can only get it to work with no compression. Nevertheless, I will post this code, with compression commented out. Note that it is for 0x88 mappings. Just change square64 to square and remove the if(ON_BOARD(sq))s for non-0x88. Also this code is messy. Don't expect to understand it

And if anyone is brave enough to try and figure out why the compression doesn't work, please do so.

Code: Select all: /* Return the square of the (index)th lowest set bit. IE bit_index(~0,i)=i. Assumes there are (index) set bits.*/ int bit_index(BITBOARD bb, int index) { while (index) { bb &= bb - 1; index--; } return SQ_TO_64(first_square(bb)); } void initialize_attacks(void) { BITBOARD mask; BITBOARD submask; BITBOARD index; BITBOARD count_mask; SQ_RANK rank; SQUARE square; SQUARE square64; int count, count2; /* COMPRESSION char index_256_12[8][256]; BITBOARD index_12_256[8][12]; int left_num[8] = { 1, 1, 2, 3, 4, 5, 6, 7 }; for (square = A1; square <= H1; square++) { for (mask = 0; mask < 256; mask++) { submask = fill_right(MASK(square), ~mask) | fill_left(MASK(square), ~mask); index_256_12[square][mask] = MAX(pop_count(submask & ray_bb[square][DIR_LF]) - 1, 0) + MAX(pop_count(submask & ray_bb[square][DIR_RT]) - 1, 0) * left_num[square]; index_12_256[square][index_256_12[square][mask]] = mask; } } */ /* Initialize the index calculation tables */ for (square = 0; square < OFF_BOARD; square++) { square64 = SQ_TO_64(square); if (ON_BOARD(square)) { mask = piece_moves_bb[BISHOP][square]; mask &= ~MASK_FILE(FILE_A); mask &= ~MASK_FILE(FILE_H); mask &= ~MASK_RANK(RANK_1); mask &= ~MASK_RANK(RANK_8); count = 0; for (rank = RANK_2; rank <= RANK_7; rank++) { for (submask = 0; submask < 256; submask++) { index = 0; count2 = 0; for (count2 = pop_count(mask & MASK_RANK(rank)) - 1; count2 >= 0; count2--) index |= (submask >> bit_index(mask, count + count2) - 8 * rank & 1) << count + count2; bishop_hash[square64][rank - 1][submask] = index; } count += pop_count(mask & MASK_RANK(rank)); } mask = piece_moves_bb[ROOK][square]; if (FILE_OF(square) != FILE_A) mask &= ~MASK_FILE(FILE_A); if (FILE_OF(square) != FILE_H) mask &= ~MASK_FILE(FILE_H); if (RANK_OF(square) != RANK_1) mask &= ~MASK_RANK(RANK_1); if (RANK_OF(square) != RANK_8) mask &= ~MASK_RANK(RANK_8); count = 0; for (rank = RANK_1; rank <= RANK_8; rank++) { /* COMPRESSION if (RANK_OF(square) == rank) { for (submask = 0; submask < 256; submask++) rook_hash[square64][rank][submask] = index_256_12[FILE_OF(square)][submask] << count; if (FILE_OF(square) == FILE_A || FILE_OF(square) == FILE_B || FILE_OF(square) == FILE_G || FILE_OF(square) == FILE_H) count += 3; else count += 4; } else */ { for (submask = 0; submask < 256; submask++) { index = 0;count2 = 0; for (count2 = pop_count(mask & MASK_RANK(rank)) - 1; count2 >= 0; count2--) index |= (submask >> bit_index(mask, count + count2) - 8 * rank & 1) << count + count2; rook_hash[square64][rank][submask] = index; } count += pop_count(mask & MASK_RANK(rank)); } } } } /* Initialize hash lookup tables */ for (square = 0; square < OFF_BOARD; square++) { square64 = SQ_TO_64(square); if (ON_BOARD(square)) { mask = piece_moves_bb[BISHOP][square]; mask &= ~MASK_FILE(FILE_A); mask &= ~MASK_FILE(FILE_H); mask &= ~MASK_RANK(RANK_1); mask &= ~MASK_RANK(RANK_8); for (count_mask = 0; count_mask < 1 << 9; count_mask++) { submask = (BITBOARD)0; count = 0; for (rank = RANK_2; rank <= RANK_7; rank++) { for (count2 = pop_count(mask & MASK_RANK(rank)) - 1; count2 >= 0; count2--) submask |= (count_mask >> count + count2 & 1) << bit_index(mask, count + count2); count += pop_count(mask & MASK_RANK(rank)); } submask = ~submask; bishop_lookup[square64][count_mask] = fill_up_right(MASK(square), submask) | fill_down_right(MASK(square), submask) | fill_down_left(MASK(square), submask) | fill_up_left(MASK(square), submask); } mask = piece_moves_bb[ROOK][square]; if (FILE_OF(square) != FILE_A) mask &= ~MASK_FILE(FILE_A); if (FILE_OF(square) != FILE_H) mask &= ~MASK_FILE(FILE_H); if (RANK_OF(square) != RANK_1) mask &= ~MASK_RANK(RANK_1); if (RANK_OF(square) != RANK_8) mask &= ~MASK_RANK(RANK_8); for (count_mask = 0; count_mask < 1 << 12/*1<<10 with compression*/; count_mask++) { submask = (BITBOARD)0; count = 0; for (rank = RANK_1; rank <= RANK_8; rank++) { /* COMPRESSION if (RANK_OF(square) == rank) { if (FILE_OF(square) == FILE_A || FILE_OF(square) == FILE_B || FILE_OF(square) == FILE_G || FILE_OF(square) == FILE_H) { submask |= index_12_256[FILE_OF(square)][count_mask >> count & 7] << 8 * rank; count += 3; } else { submask |= index_12_256[FILE_OF(square)][count_mask >> count & 15] << 8 * rank; count += 4; } } else */ { for (count2 = pop_count(mask & MASK_RANK(rank)) - 1; count2 >= 0; count2--) submask |= (count_mask >> count + count2 & 1) << bit_index(mask, count + count2); count += pop_count(mask & MASK_RANK(rank)); } } submask = ~submask; rook_lookup[square64][count_mask] = fill_up(MASK(square), submask) | fill_right(MASK(square), submask) | fill_down(MASK(square), submask) | fill_left(MASK(square), submask); } } } }

This code, without compression, seemed to be even slower than the & method. So I don't think I will play much more with these methods unless someone thinks of something new.

Zach

by **Michael Sherwin** » 20 Sep 2006, 01:25

This code, without compression, seemed to be even slower than the & method. So I don't think I will play much more with these methods unless someone thinks of something new.

Hi Zach,

Thanks for trying!

I hope that the speed issue can be resolved and that I am not wasting peoples time. On the surface it looks to be very fast especially for 32 bit machines. Maybe it is cash unfriendly though.

Mike

by **Michael Sherwin** » 20 Sep 2006, 08:24

I have been fighting off a cold for a few days now. It has finally won
and is taking me down.

I know that I said that I would present the rook in the more clear form,
however, given the circumstances this is the best that I can do. I have
tried to correct the oversights that I made with the bishop code.

Here is the code for the rook.

EDIT: This code block has been corrected 9/20/06

Code: Select all: typedef struct { u08 row1; u08 row2; u08 row3; u08 row4; u08 row5; u08 row6; u08 row7; u08 row8; } row; typedef union { u64 u; row; } blocks; u32 index; u32 rookSubTbls[64]; u32 rookRows[64][8][256]; u32 *rRows; u64 rookBits[64]; u64 rookMoveTbl[102400]; blocks blockers; rRows = &rookRows[sq][0][0]; blockers.u = rookBits[sq] & allPieces; index = rookSubTbls[sq] + ((rRows + 0)[blockers.row1] | (rRows + 256)[blockers.row2] | (rRows + 512)[blockers.row3] | (rRows + 768)[blockers.row4] | (rRows + 1024)[blockers.row5] | (rRows + 1280)[blockers.row6] | (rRows + 1536)[blockers.row7] | (rRows + 1792)[blockers.row8]); moveBits = rookMoveTbl[index] ^ friendlies;

The '+' operator must be used to combine the base index with the subindex. (Harald)
When I am better I'll start on the initialization functions.

Mike

by **Tony van Roon-Werten** » 20 Sep 2006, 13:23

I'm not sure I understand all of it, but I think you can reduce de index size.

Since even a rook on d4 has only 3*3*4*4=144 different attack patterns, one can use an aditional redirection, and shrink rookMoveTbl quite a lot.

I don't want to calculate too much now so I just took 144*64 for it, but it should be smaller.

So rather than 102400*8bytes=819200 bytes one only needs

(102400*2)+(144*64*8) bytes=278.528 bytes.

Tony

Code: Select all: typedef union { u64 u; u08 row1; u08 row2; u08 row3; u08 row4; u08 row5; u08 row6; u08 row7; u08 row8; } blocks; u32 index; u32 rookSubTbls[64]; u32 rookRows[64][8][256]; u32 *rRows; u64 rookBits[64]; u16 rookMoveTblIndex[102400]; u64 rookMoveTbl[9216]; blocks blockers; rRows = &rookRows[sq][0][0]; blockers.u = rookBits[sq] & allPieces; index = rookSubTbls[sq] | (rRows + 0)[blockers.row1] | (rRows + 256)[blockers.row2] | (rRows + 512)[blockers.row3] | (rRows + 768)[blockers.row4] | (rRows + 1024)[blockers.row5] | (rRows + 1280)[blockers.row6] | (rRows + 1536)[blockers.row7] | (rRows + 1792)[blockers.row8]; index = rookMoveTblIndex[index] ; moveBits = rookMoveTbl[index] ^ friendlies;

by **Harald Lüßen** » 20 Sep 2006, 17:43

Michael Sherwin wrote:I have been fighting off a cold for a few days now. It has finally won
and is taking me down.

I know that I said that I would present the rook in the more clear form,
however, given the circumstances this is the best that I can do. I have
tried to correct the oversights that I made with the bishop code.

Here is the code for the rook.

Code: Select all
typedef union { u64 u; u08 row1; u08 row2; u08 row3; u08 row4; u08 row5; u08 row6; u08 row7; u08 row8; } blocks; u32 index; u32 rookSubTbls[64]; u32 rookRows[64][8][256]; u32 *rRows; u64 rookBits[64]; u64 rookMoveTbl[102400]; blocks blockers; rRows = &rookRows[sq][0][0]; blockers.u = rookBits[sq] & allPieces; index = rookSubTbls[sq] | (rRows + 0)[blockers.row1] | (rRows + 256)[blockers.row2] | (rRows + 512)[blockers.row3] | (rRows + 768)[blockers.row4] | (rRows + 1024)[blockers.row5] | (rRows + 1280)[blockers.row6] | (rRows + 1536)[blockers.row7] | (rRows + 1792)[blockers.row8]; moveBits = rookMoveTbl[index] ^ friendlies;

As far as I know either the '|' operator or the '+' operator can be used
to combine the index. I do not know if there is a speed difference. I
have gone through the code a couple of times and I do not see any obvious
errors.

When I am better I'll start on the initialization functions.

Mike

You stay to the big rookRows[64][8][256] index. Why?
Can you give us some real code for blockers being a 64-bit integer
and a struct with .row1 to .dow8 members.?

You use the |-operator here:
index = rookSubTbls[sq] | (rRows + 0)[blockers.row1] ...
I can not see how this could work. the rRows-bits
form a 9bit/7bit/6bit/5bit value with the range 0..511/127/63/31,
at least for the bishops. How can you find an index for rookMoveTbl[index] using | ?
With + this is very easy to understand. Just put one sub-array
after the other ang give the offsets in rookSubTbls[sq].

Harald

by **Michael Sherwin** » 20 Sep 2006, 21:23

I am very sick with a fever at the moment, however, I will try to answer your questions.

You stay to the big rookRows[64][8][256] index. Why?

The rooks can move along the edges. Therfore, all 8 rows are needed and all 8 bits in each row are needed.

Can you give us some real code for blockers being a 64-bit integer
and a struct with .row1 to .dow8 members.?

Sorry, my C primer book is vauge about unions. Maybe this is how to do it.

Code: Select all: typedef struct { u08 row1; u08 row2; u08 row3; u08 row4; u08 row5; u08 row6; u08 row7; u08 row8; } row; typedef union { u64 u; row rows; } blocks; index = rookSubTbls[sq] | (rRows + 0)[blockers.rows.row1] | (rRows + 256)[blockers.rows.row2] | (rRows + 512)[blockers.rows.row3] ... | (rRows + 1792)[blockers.rows.row8];

You use the |-operator here:
index = rookSubTbls[sq] | (rRows + 0)[blockers.row1] ...
I can not see how this could work. the rRows-bits
form a 9bit/7bit/6bit/5bit value with the range 0..511/127/63/31,
at least for the bishops. How can you find an index for rookMoveTbl[index] using | ?
With + this is very easy to understand. Just put one sub-array
after the other ang give the offsets in rookSubTbls[sq].

Code: Select all: bishopSubTbls[0] = 0; // 0000000(7) | 111111(6) max indici 63 bishopSubTbls[1] = 64; // 1000000(7) | 11111(5) max indici 95 bishopSubTbls[2] = 96; // 1100000(7) | 11111(5) max indici 127 rookSubTbls[0] = 0; // 0000000000000(13) | 111111111111(12) max indici 4095 rookSubTbls[1] = 4096; // 1000000000000(13) | 11111111111(11) max indici 6143 rookSubTbls[2] = 6144; // 1100000000000(13) | 11111111111(11) max indici 8191

If I am wrong about any of the above, then I am glad to recieve correction. As far as using '+' instead of '|' that is perfectly okay with me.

Mike

by **Gerd Isenberg** » 20 Sep 2006, 23:03

Michael Sherwin wrote:I am very sick with a fever at the moment, however, I will try to answer your questions.

You stay to the big rookRows[64][8][256] index. Why?

The rooks can move along the edges. Therfore, all 8 rows are needed and all 8 bits in each row are needed.

Can you give us some real code for blockers being a 64-bit integer
and a struct with .row1 to .dow8 members.?

Sorry, my C primer book is vauge about unions. Maybe this is how to do it.

Code: Select all
typedef struct { u08 row1; u08 row2; u08 row3; u08 row4; u08 row5; u08 row6; u08 row7; u08 row8; } row; typedef union { u64 u; row rows; } blocks; index = rookSubTbls[sq] | (rRows + 0)[blockers.rows.row1] | (rRows + 256)[blockers.rows.row2] | (rRows + 512)[blockers.rows.row3] ... | (rRows + 1792)[blockers.rows.row8];

You use the |-operator here:
index = rookSubTbls[sq] | (rRows + 0)[blockers.row1] ...
I can not see how this could work. the rRows-bits
form a 9bit/7bit/6bit/5bit value with the range 0..511/127/63/31,
at least for the bishops. How can you find an index for rookMoveTbl[index] using | ?
With + this is very easy to understand. Just put one sub-array
after the other ang give the offsets in rookSubTbls[sq].

Code: Select all
bishopSubTbls[0] = 0; // 0000000(7) | 111111(6) max indici 63 bishopSubTbls[1] = 64; // 1000000(7) | 11111(5) max indici 95 bishopSubTbls[2] = 96; // 1100000(7) | 11111(5) max indici 127 rookSubTbls[0] = 0; // 0000000000000(13) | 111111111111(12) max indici 4095 rookSubTbls[1] = 4096; // 1000000000000(13) | 11111111111(11) max indici 6143 rookSubTbls[2] = 6144; // 1100000000000(13) | 11111111111(11) max indici 8191

If I am wrong about any of the above, then I am glad to recieve correction. As far as using '+' instead of '|' that is perfectly okay with me.

Mike

A anonymious struct inside the union would be fine. Or simply a union with u64 and u8[8]. But i wouldn't do that with a union but stay with a u64 scalar like Harald suggested, to keep things in registers and to perform consecutive shifts,and. You allready have nine reads anyway - at least nine cacheliens. Using six or eight additional bytewise reads seems more 32-bit friendly on the first glance, but there is no more chance to read two "blocks" simultaniously. I think your approach has some nice didactical features to learn how Pradu's approach works. Simply count instructions and memory reads and keep L1 footprint in mind...

Become good well soon!

Cheers,
Gerd

Winboard Forum

New bitboard move generator

New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Re: New bitboard move generator

Who is online