Winboard Forum

by **Tom Likens** » 14 Mar 2006, 22:45

Hello Everyone,

Tord's question on bitboards made me decide to ask a question of my own. I'm wondering how compact I can get the rotated bitboard arrays for the ranks, files and diagonals. I already use the trick of masking off the end bits, so that the arrays decrease in size from

4 x [64][256] x 8 = 512k

to

4 x [64][64] x 8 = 128k

A nice size reduction of 3/4, at the cost of additional runtime calculations. Changing to this scheme gave me a nice speed up, although off-hand I don't remember how much, it was measurable. I haven't implemented, but am considering taking advantage of reducing the diagonals, since they don't need an index of 6 bits, except for the one long diagonal. In fact, if I've done my math right I'm estimating that I can reduce the combined memory requirements for both diagonals down to about 20k, much smaller than their current 64k requirement.

Anyway, this is similar to what Crafty does with its compact attacks stuff. But I don't really want to duplicate Bob's work. Instead I'd like to do something semi-original. I'm wondering if anyone else would be willing to share some of what they are doing to reduce these arrays, (or at least engage in a conversation on the subject)?

regards,
--tom

by **Tord Romstad** » 14 Mar 2006, 23:54

Tom Likens wrote:Hello Everyone,

Tord's question on bitboards made me decide to ask a question of my own. I'm wondering how compact I can get the rotated bitboard arrays for the ranks, files and diagonals. I already use the trick of masking off the end bits, so that the arrays decrease in size from

4 x [64][256] x 8 = 512k

to

4 x [64][64] x 8 = 128k

A nice size reduction of 3/4, at the cost of additional runtime calculations. Changing to this scheme gave me a nice speed up, although off-hand I don't remember how much, it was measurable. I haven't implemented, but am considering taking advantage of reducing the diagonals, since they don't need an index of 6 bits, except for the one long diagonal. In fact, if I've done my math right I'm estimating that I can reduce the combined memory requirements for both diagonals down to about 20k, much smaller than their current 64k requirement.

Anyway, this is similar to what Crafty does with its compact attacks stuff. But I don't really want to duplicate Bob's work. Instead I'd like to do something semi-original. I'm wondering if anyone else would be willing to share some of what they are doing to reduce these arrays, (or at least engage in a conversation on the subject)?

Hello Tom,

I have seen several ways to compress the tables. The following one, which was invented by Sergei Markoff, reduces the array size to slightly more than 28 kB. It's somewhat tricky to explain, but I'll give it a try. If it is totally incomprehensible, I'll try to write some demonstration code tomorrow.

Consider a one-dimensional chess variant with a board consisting of 8 squares in a single line. If we write a bitboard program for playing such a chess variant, we would only need a single small array for generating all sliding attacks:

Code: Select all: unsigned char OneDimensionalHorizontalAttacks[8][256];

By using some elementary combinatorics, it can easily be shown that the set of all attack bitboards occuring in the OneDimensionalHorizontalAttacks array for one-dimensional chess contains exactly 70 different entries. Enumerate these from 0 to 69. Introduce a new 2-dimensional array, AttackIndex, which is indexed by a square and a bit vector, and where the entry at index (i,j) contains the number of the corresponding sliding attack bit vector (in the range 0-69):

Code: Select all: unsigned char AttackIndex[8][256]

We now return to 2-dimensional 8x8 chess. Assume that we want to generate a bitboard of horizontal attacks for a rook. We first extract the 8-bit bit vector representing the state of the rank of the square, and find the index in the range 0-7 representing the rook's position along the rank. Use the AttackIndex table introduced above to compress the 8x256 different possibilities to a number in the range 0-69. This number, combined with the index of the rank, can be used as indices to a 2-dimensional array of attack bitboards of horizontal attacks. The bitboard array would look like this:

Code: Select all: bitboard_t HorizontalAttacksArray[70][8];

The following macro can now be used to compute horizontal attack bitboards (untested, bugs are possible):

Code: Select all: #define HorizontalAttacks(sq) \ HorizontalAttacksArray[AttackIndex[file(sq)][(R00bb>>(rank(sq)/8))&0xFF]][rank(sq)]

Vertical attacks can of course be computed in exactly the same way, by exchanging files with ranks, replacing R00bb with R90bb, and using a different lookup table. Diagonal attacks can also be done in a similar way, except that we need to keep all the 16 different diagonals and anti-diagonals separate (we can't "glue together" the a7-b8 diagonal and the c1-h6 diagonal the way we do for conventional rotated bitboards). We end up with the following four arrays:

Code: Select all: bitboard_t HorizontalAttacksArray[70][8]; bitboard_t VerticalAttacksArray[70][8]; bitboard_t DiagonalAttacksArray[70][16]; bitboard_t AntiDiagonalAttacksArray[70][16];

The total size of all the above arrays is 28,928 bytes. It is probably possible to compress this further at the cost of some additional runtime calculations.

I don't currently use this, though. The 128 kB arrays are very slightly faster for my program.

Tord

by **Tom Likens** » 15 Mar 2006, 03:25

Thanks Tord,

This is interesting to say the least. Save your code for the moment, as I'd like to try and work out the details on my own. I suspect I'll have a few questions as I get into it. One thing that occurs to me almost immediately, is that it might be productive to burn a bit of memory to make the 70 index into a power of two--perhaps 128 (of course 64 would be more attractive). This would still use far less memory than the 128k scheme, and might be faster to access. Time to experiment.

regards,
--tom

P.S. BTW, this Markoff fella seems rather bright!

by **Zach Wegner** » 15 Mar 2006, 05:34

Hello Tord,

Wouldn't it be a bit faster if you changed AttackIndex to a bitboard_t*, and just indexed it by [file][rank_state][rank]?

And BTW, changing the 70 index to a power of two won't help in this code, because it's the leftmost dimension.

Zach

by **Tony van Roon-Werten** » 15 Mar 2006, 09:25

Tom Likens wrote:Hello Everyone,

Tord's question on bitboards made me decide to ask a question of my own. I'm wondering how compact I can get the rotated bitboard arrays for the ranks, files and diagonals. I already use the trick of masking off the end bits, so that the arrays decrease in size from

4 x [64][256] x 8 = 512k

to

4 x [64][64] x 8 = 128k

A nice size reduction of 3/4, at the cost of additional runtime calculations. Changing to this scheme gave me a nice speed up, although off-hand I don't remember how much, it was measurable. I haven't implemented, but am considering taking advantage of reducing the diagonals, since they don't need an index of 6 bits, except for the one long diagonal. In fact, if I've done my math right I'm estimating that I can reduce the combined memory requirements for both diagonals down to about 20k, much smaller than their current 64k requirement.

Anyway, this is similar to what Crafty does with its compact attacks stuff. But I don't really want to duplicate Bob's work. Instead I'd like to do something semi-original. I'm wondering if anyone else would be willing to share some of what they are doing to reduce these arrays, (or at least engage in a conversation on the subject)?

regards,
--tom

I've got this idea from Gerd, I don't think he minds sharing it.

It's about sliders, the nonsliders are obvious.

For every square on the board, there are 4 directions. For each combination of square<->direction generate a mask and a magic number so that (for this square<->direction combination) :

1) the mask gives all reachable squares on an empty board ( minus the edges ie the 64 trick)
2) the magic number generates an unique index in the range [0..63] ( use deBruijn for this) for every possible outcome of 1)

this costs 64*4*2*8= 4KB

generate a lookuptable for [square][direction][index] wich contains the squares the piece can move to.

this costs 64*64*4*8=128 KB

so together 132 KB.

Why is this nice ? You don't need rotated bitboards :!:

Cheers,

Tony

by **Tord Romstad** » 15 Mar 2006, 11:52

Zach Wegner wrote:Hello Tord,

Wouldn't it be a bit faster if you changed AttackIndex to a bitboard_t*, and just indexed it by [file][rank_state][rank]?

Hello Zach,

Perhaps I'm stupid, but I don't understand what you mean at all. Could you please explain this in greater detail?

And BTW, changing the 70 index to a power of two won't help in this code, because it's the leftmost dimension.

Yes, I agree.

Tord

by **Tord Romstad** » 15 Mar 2006, 12:12

Tony van Roon-Werten wrote:I've got this idea from Gerd, I don't think he minds sharing it.

It's about sliders, the nonsliders are obvious.

For every square on the board, there are 4 directions. For each combination of square<->direction generate a mask and a magic number so that (for this square<->direction combination) :

1) the mask gives all reachable squares on an empty board ( minus the edges ie the 64 trick)
2) the magic number generates an unique index in the range [0..63] ( use deBruijn for this) for every possible outcome of 1)

this costs 64*4*2*8= 4KB

generate a lookuptable for [square][direction][index] wich contains the squares the piece can move to.

Hi Tony,

Evidently I am even slower than usual today. I didn't understand a word of Zach's suggestion, and I also struggle to understand the idea you describe. Let me see if I understand you correctly.

Assume that we want to compute a bitboard of all attacks in the a1-h8 direction for a bishop on d4. We begin by looking up the mask of all reachable squares on an empty board, minus the edge squares. This mask will look like this:

Code: Select all: -------- ------#- -----#-- ----#--- -------- --#----- -#------ --------

We 'and' this mask with the bitboard of all occupied squares, and obtain a bitboard which could (for instance) look like this:

Code: Select all: -------- ------#- -----#-- -------- -------- -------- -#------ --------

This bitboard is multiplied by some magic number (and perhaps shifted to the right?), and the result is a number in the range 0-63. This number, together with the square index and the direction, are used to look up a attack bitboard from a 128 kB array.

Assuming that my explanation above is correct, I still don't understand how to find the 'magic numbers', and your de Bruijn hint doesn't really help me. I understand how to use binary de Bruijn sequences to hash a single-1 bitboard to an integer in the range 0-63, but I don't see how they can be used to create perfect hash functions for bitboards with more than one nonzero bit.

Tord

by **Tony van Roon-Werten** » 15 Mar 2006, 13:45

Tord Romstad wrote:
Tony van Roon-Werten wrote:I've got this idea from Gerd, I don't think he minds sharing it.

It's about sliders, the nonsliders are obvious.

For every square on the board, there are 4 directions. For each combination of square<->direction generate a mask and a magic number so that (for this square<->direction combination) :

1) the mask gives all reachable squares on an empty board ( minus the edges ie the 64 trick)
2) the magic number generates an unique index in the range [0..63] ( use deBruijn for this) for every possible outcome of 1)

this costs 64*4*2*8= 4KB

generate a lookuptable for [square][direction][index] wich contains the squares the piece can move to.

Hi Tony,

Evidently I am even slower than usual today. I didn't understand a word of Zach's suggestion, and I also struggle to understand the idea you describe. Let me see if I understand you correctly.

Assume that we want to compute a bitboard of all attacks in the a1-h8 direction for a bishop on d4. We begin by looking up the mask of all reachable squares on an empty board, minus the edge squares. This mask will look like this:
Code: Select all
-------- ------#- -----#-- ----#--- -------- --#----- -#------ --------

We 'and' this mask with the bitboard of all occupied squares, and obtain a bitboard which could (for instance) look like this:
Code: Select all
-------- ------#- -----#-- -------- -------- -------- -#------ --------

This bitboard is multiplied by some magic number (and perhaps shifted to the right?), and the result is a number in the range 0-63. This number, together with the square index and the direction, are used to look up a attack bitboard from a 128 kB array.

Assuming that my explanation above is correct, I still don't understand how to find the 'magic numbers', and your de Bruijn hint doesn't really help me. I understand how to use binary de Bruijn sequences to hash a single-1 bitboard to an integer in the range 0-63, but I don't see how they can be used to create perfect hash functions for bitboards with more than one nonzero bit.

Tord

Hi Tord

You understand correct. ( no shift right needed)

The finding of the magic numbers was based on deBruijn.

The idea is that:

(Mask*magicNumber)>>(64-bitRange) gives an index, where bitRange is defined so that pow(2,bitRange) gives the index range.

The trick is to find a magic number that gives an unique index for all possible set bits of Mask ie if mask is 011 then you would want the magic number to give a 0,1,2 and 3 for all possible submasks (ie 000,001,010,011) in no particular order as long as they are unique.

I think it was Walter Presscot who gave a nice trick for the enumerate part.

I'll give you the code

Code: Select all: bool FoundMagic(unsigned __int64 Mask,unsigned __int64 magicNumber) { unsigned __int64 subMask=0,check=0; int index; do { subMask = (subMask-Mask) & Mask; index=(subMask *magicNumber)>>(64-6); if (check & ((unsigned __int64) 1)<<index) return(false); check|=((unsigned __int64) 1)<<index; } while (subMask); return(true); }

That's it

Just pump in some random numbers and your magic numbers should be finished in 15 secs.

It's probably possible to reduce the index range even further, but I haven't tested that yet. I was to happy to get this working.

Thinking about it though, I think I could reduce it to 20 KB. That's not bad, 20KB with only limited calculation AND no rotated bitboards.

EDIT: Hmm, it seems I was a bit over optimistic and even worse.. plain wrong. I would need quite a bit of extra calculation for that.

I'll keep you informed....

Tony

by **Tom Likens** » 15 Mar 2006, 15:25

Zach Wegner wrote:
And BTW, changing the 70 index to a power of two won't help in this code, because it's the leftmost dimension.

Zach

Zach,

You're right of course. I've been programming long enough to remember that to find the address of "type array[i][j]" the compiler has to do the following:

address = array + (i * COLSIZE * sizeof(type)) + (j * sizeof(type))

And that if your compiler is smart and COLSIZE is a power of two (equal to 2^n) this turns into:

address = array + ((i << n) * sizeof(type)) + (j * sizeof(type))

Which as you said doesn't depend on the ROWSIZE at all. Of course it may be possible to optimize this even more depending on the size of "type", but that's just a detail.

regards,
--tom

by **Tom Likens** » 15 Mar 2006, 15:41

Tony van Roon-Werten wrote:It's probably possible to reduce the index range even further, but I haven't tested that yet. I was to happy to get this working.

Hello Tony,

I hadn't realized that XiniX was bitboard based, for some reason I assumed it was an 0x88 / mailbox hybrid. What kind of a speed did this produce? The idea of getting rid of the rotated bitboards is attractive. It would be especially nice to get rid of the overhead of having to update the rotated maps at every make / unmake call.

BTW, do you have a decent link for information on deBruijn numbers and theory?

regards,
--tom

by **mathmoi** » 15 Mar 2006, 15:47

Tony van Roon-Werten wrote:I've got this idea from Gerd, I don't think he minds sharing it.

It's about sliders, the nonsliders are obvious.

For every square on the board, there are 4 directions. For each combination of square<->direction generate a mask and a magic number so that (for this square<->direction combination) :

1) the mask gives all reachable squares on an empty board ( minus the edges ie the 64 trick)
2) the magic number generates an unique index in the range [0..63] ( use deBruijn for this) for every possible outcome of 1)

this costs 64*4*2*8= 4KB

generate a lookuptable for [square][direction][index] wich contains the squares the piece can move to.

this costs 64*64*4*8=128 KB

so together 132 KB.

Why is this nice ? You don't need rotated bitboards

Cheers,

Tony

Hi Tony,

This technique seems to be a simple substitute to the rotated bitmap technique.

However, the multiplication and shift used to compute the index can be an overhead compared to the classic bitboard technique.

Do you have any data about the efficiency of this technique compared to a classic one?

My bet is that it is significantly slower, but I can be wrong. Actualy I hope I am.

by **Tony van Roon-Werten** » 15 Mar 2006, 16:03

Tom Likens wrote:
Tony van Roon-Werten wrote:It's probably possible to reduce the index range even further, but I haven't tested that yet. I was to happy to get this working.

Hello Tony,

I hadn't realized that XiniX was bitboard based, for some reason I assumed it was an 0x88 / mailbox hybrid. What kind of a speed did this produce? The idea of getting rid of the rotated bitboards is attractive. It would be especially nice to get rid of the overhead of having to update the rotated maps at every make / unmake call.

BTW, do you have a decent link for information on deBruijn numbers and theory?

regards,
--tom

XiniX used to be 0x88, now it's hybrid BBx88

Speed is hard to tell. On 32 bit it's about 20% slower I guess. It's the 64b multiplication that is very expensive.

On 64 bit it's free so there it's only memory lookups costs.

The reason I like it is because I find it simple,elegant and very usefull for other stuff. fe the index doesn't have to point to an array of move bitboards ....

For the links, go to CCC search engine, type Gerd Isenberg and ... well basicly that should do it.

Tony

by **Gerd Isenberg** » 15 Mar 2006, 16:06

This rotated index substitute requires a fast 64*64 = 64-bit multiplication.
AMD64 64-bit mode has 4 cycles direct path mul - and one cycle shift.
Guess future intels will be competetive as well.

Gerd

by **Tony van Roon-Werten** » 15 Mar 2006, 16:06

mathmoi wrote:
Tony van Roon-Werten wrote:I've got this idea from Gerd, I don't think he minds sharing it.

It's about sliders, the nonsliders are obvious.

For every square on the board, there are 4 directions. For each combination of square<->direction generate a mask and a magic number so that (for this square<->direction combination) :

1) the mask gives all reachable squares on an empty board ( minus the edges ie the 64 trick)
2) the magic number generates an unique index in the range [0..63] ( use deBruijn for this) for every possible outcome of 1)

this costs 64*4*2*8= 4KB

generate a lookuptable for [square][direction][index] wich contains the squares the piece can move to.

this costs 64*64*4*8=128 KB

so together 132 KB.

Why is this nice ? You don't need rotated bitboards

Cheers,

Tony

Hi Tony,

This technique seems to be a simple substitute to the rotated bitmap technique.

However, the multiplication and shift used to compute the index can be an overhead compared to the classic bitboard technique.

Do you have any data about the efficiency of this technique compared to a classic one?

My bet is that it is significantly slower, but I can be wrong. Actualy I hope I am.

Hi,

see my post for Tom.

In addition, on a Pentium M 1.8 Ghz, XiniX does about 500Kn/s ( EDIT in the middle game) , and that's with very full mobility.

So, it can't be that bad, but it will be a lot better on 64b

Tony

by **Gerd Isenberg** » 15 Mar 2006, 17:35

Hi Tom,

I think with todays caches and 4K memory page organisation it is probably not so important to shrink the lookup tables - at least for some processors and depending on the memory or cache footprint of your program.

The additional overhead to compute a six bit occupied state is negliable, so i would prefere a smaller table as well. I heard of one case (Alessandro iirc) where the size reduction was even a slowdown!? (Who knows which other chaotical effects have interacted ;-)

A possible explanation:
during search one probably don't profits so much from cache hits from cachelines loaded to L1 for other occupied states. Most occupied states are relative stable and don't change so randomly during the search. AMD64 has 1024 64-byte cachelines. So let say your search needs 256 of them very often, it doesn't matter so much whether they are picked from 128K or 512K.

Cheers,
Gerd

Btw. the DeBruijn paper:
http://supertech.csail.mit.edu/papers/debruijn.pdf
you may find this one from CCC shadow archives interesting:
http://hornid.com/cgi-bin/ccc/topic_sho ... #pid467934

by **Alessandro Damiani** » 15 Mar 2006, 18:46

Hi Gerd,

Yes, I had the slowdown. But it was my fault: I was ready to test a minor change in the bitboard framework when I read about the size reduction on CCC. I was so excited about it I included the size reduction code but forgot about my first change. The slowdown was caused by the first change and not by the size reduction code.

I didn't report this because I thought nobody was interested.

by **Tony van Roon-Werten** » 15 Mar 2006, 19:23

Gerd Isenberg wrote:This rotated index substitute requires a fast 64*64 = 64-bit multiplication.
AMD64 64-bit mode has 4 cycles direct path mul - and one cycle shift.
Guess future intels will be competetive as well.

Gerd

I am working an a folded multiplication, but haven't had any succes yet. But there are still a few ideas I want to explore.

Tony

PS what would that make it ? de Bruijn-Folded-non Rotated-Bitboards ?

by **Gerd Isenberg** » 15 Mar 2006, 19:46

ahh yes, may be Matt Taylor's bitscan is a hint:

Code: Select all: u32 bitScanFwd(BB b) { b ^= (b - 1); unsigned int fold = ((int) b) ^ ((int)(b>>32)); return lsz64_tbl[(fold * 0x78291ACF) >> (32-6)]; }

by **Gerd Isenberg** » 15 Mar 2006, 19:49

i see, multiple changes are always difficult ;-)

by **Dann Corbit** » 15 Mar 2006, 20:02

Tord Romstad wrote:
Tony van Roon-Werten wrote:I've got this idea from Gerd, I don't think he minds sharing it.

It's about sliders, the nonsliders are obvious.

For every square on the board, there are 4 directions. For each combination of square<->direction generate a mask and a magic number so that (for this square<->direction combination) :

1) the mask gives all reachable squares on an empty board ( minus the edges ie the 64 trick)
2) the magic number generates an unique index in the range [0..63] ( use deBruijn for this) for every possible outcome of 1)

this costs 64*4*2*8= 4KB

generate a lookuptable for [square][direction][index] wich contains the squares the piece can move to.

Hi Tony,

Evidently I am even slower than usual today. I didn't understand a word of Zach's suggestion, and I also struggle to understand the idea you describe. Let me see if I understand you correctly.

Assume that we want to compute a bitboard of all attacks in the a1-h8 direction for a bishop on d4. We begin by looking up the mask of all reachable squares on an empty board, minus the edge squares. This mask will look like this:
Code: Select all
-------- ------#- -----#-- ----#--- -------- --#----- -#------ --------

We 'and' this mask with the bitboard of all occupied squares, and obtain a bitboard which could (for instance) look like this:
Code: Select all
-------- ------#- -----#-- -------- -------- -------- -#------ --------

This bitboard is multiplied by some magic number (and perhaps shifted to the right?), and the result is a number in the range 0-63. This number, together with the square index and the direction, are used to look up a attack bitboard from a 128 kB array.

Assuming that my explanation above is correct, I still don't understand how to find the 'magic numbers', and your de Bruijn hint doesn't really help me. I understand how to use binary de Bruijn sequences to hash a single-1 bitboard to an integer in the range 0-63, but I don't see how they can be used to create perfect hash functions for bitboards with more than one nonzero bit.

Tord

You can always calculate the numbers by creation of a minimal perfect hash. Do a google search for "minimal perfect hash" and it will turn up plenty of good explanations.

Winboard Forum

Compact Bitboard Attacks

Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Who is online