Winboard Forum

by **Harald Lüßen** » 13 Sep 2006, 17:12

Hi

You inspired me to think of an own bitboard attack function.
It is somewhere between a traditional for loop and the modern magic.
It is not compiled. Read it as pseudo code.
It is not tested. Could it work?
And will it be slow or fast?

Code: Select all: typedef uint64_t Bitboard; Bitboard initB[64]; Bitboard maskB[64]; /* = init >>= 9 *= 0x5005 &= mask at |= bb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 . . 1 . . . 1 . . . 1 . . . 1 . . . 1 . . . . 1 . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 . . . . . . s . . . . . . . 1 . 1 . . 1 1 . . 1 . . . . . . s . . . . . . . s . . . . . . 1 . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 . . . . . . . . . . . . . . 1 . 1 . . . 1 1 . . 1 . . . 1 . . . 1 . . . 1 . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */ Bitboard bishopAttacks( int sq, const Bitboard &free ) { Bitboard bb = initB[sq]; Bitboard at = bb; for ( int i = 0; i < 7 && bb != 0; ++i ) { bb >>= 9; bb &= 0x007f7f7f7f7f7f7f7f; bb *= 0x00050005; // 00000000000001010000000000000101 bb &= maskB[sq]; at |= bb; bb &= free; } return at; } /* . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 . . . . . .sq . . . . . . 1 . 1 . . . . . . . . . . . . . . . . . . . */ makeInitB() { for ( int sq = 0; sq < 64; ++sq ) { Bitboard bb = (Bitboard)1 << sq; initB[sq] = (bb >> 9) & 0x7f7f7f7f7f7f7f7f; initB[sq] |= (bb >> 7) & 0xfefefefefefefefe; initB[sq] |= (bb << 9) & 0xfefefefefefefefe; initB[sq] |= (bb << 7) & 0x7f7f7f7f7f7f7f7f; } } /* . . . . . . . 1 1 . . . . . 1 . . 1 . . . 1 . . . . 1 . 1 . . . . . .sq . . . . . . 1 . 1 . . . . 1 . . . 1 . . 1 . . . . . 1 . */ makeMaskB() { for ( int sq = 0; sq < 64; ++sq ) { maskB[sq] = 0; for ( int i = sq - 9; i >= 0 && i % 8 != 7; i -= 9 ) { maskB[sq] |= (Bitboard)1 << i; } for ( int i = sq - 7; i >= 0 && i % 8 != 0; i -= 7 ) { maskB[sq] |= (Bitboard)1 << i; } for ( int i = sq + 9; i < 64 && i % 8 != 0; i += 9 ) { maskB[sq] |= (Bitboard)1 << i; } for ( int i = sq + 7; i < 64 && i % 8 != 7; i += 7 ) { maskB[sq] |= (Bitboard)1 << i; } } }

What do you think?

Harald

by **Harald Lüßen** » 13 Sep 2006, 22:00

Hello again

And for rooks it would be:

Code: Select all: typedef uint64_t Bitboard; Bitboard initR[64]; Bitboard maskR[64]; /* = init >>= 9 *= 20502 &= mask at |= bb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . 1 . . . . . . . 1 . . . . . . . 1 . . . . . . . . . . . . . . 1 . 1 . . . . . . . . . . . . . . 1 . . . . . . 1 s 1 . . . . . . . 1 . . . . 1 . 1 . 1 . . . 1 . s . 1 . . . 1 1 s 1 1 . . . . . 1 . . . . . . . 1 . 1 . . . . 1 . 1 . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . 1 . . . . . . 1 . . . . . . . 1 . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . */ Bitboard rookAttacks( int sq, const Bitboard &free ) { Bitboard bb = initR[sq]; Bitboard at = bb; for ( int i = 0; i < 7 && bb != 0; ++i ) { bb >>= 9; bb &= 0x007f7f7f7f7f7f7f7f; bb *= 0x00020502; // 00000000000000100000010100000010 bb &= maskR[sq]; at |= bb; bb &= free; } return at; } /* . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . 1sq 1 . . . . . . 1 . . . . . . . . . . . . . . . . . . . . */ makeInitR() { for ( int sq = 0; sq < 64; ++sq ) { Bitboard bb = (Bitboard)1 << sq; initB[sq] = (bb >> 8); initB[sq] |= (bb >> 1) & 0x7f7f7f7f7f7f7f7f; initB[sq] |= (bb << 1) & 0xfefefefefefefefe; initB[sq] |= (bb << 8); } } /* . . . 1 . . . . . . . 1 . . . . . . . 1 . . . . . . . 1 . . . . 1 1 1sq 1 1 1 1 . . . 1 . . . . . . . 1 . . . . . . . 1 . . . . */ makeMaskR() { for ( int sq = 0; sq < 64; ++sq ) { maskR[sq] = 0; for ( int i = sq - 8; i >= 0; i -= 8 ) { maskR[sq] |= (Bitboard)1 << i; } for ( int i = sq - 1; i >= 0 && i % 8 != 7; i -= 1 ) { maskR[sq] |= (Bitboard)1 << i; } for ( int i = sq + 1; i < 64 && i % 8 != 0; i += 1 ) { maskR[sq] |= (Bitboard)1 << i; } for ( int i = sq + 8; i < 64; i += 8 ) { maskR[sq] |= (Bitboard)1 << i; } } }

In case it matters, my square numbering is like this:
63..56
......
7.. 0

Harald

by **Gerd Isenberg** » 14 Sep 2006, 00:20

Hi Harald,

since you replied to Michael's post, i'll interprete your "you" as anybody who read this - i like to give my two cents for your very nice idea. It is an ressource friendly, interesting approach, but i think it can't compete with most other approaches mentioned here - except you have most likely trapped bishops.

You'll take the initial attacks by lookup and do what my dumb7fill or Steffan Westcott's Kogge-Stone-Fill do with one direction and one slider, with four directions simultaniously. Your >>= 9, *= 0x5005; &= mask expands or "explodes" by one step in all directions simultaniously. Since you already made the first shifts by lookup, up to six steps.

It is necessary to mask with "free" before the first shift-mul-expansion, The bb &= 0x007f7f7f7f7f7f7f7f is not necessary, since wrapped bits don't share same color. Otherwise, a free &= 0x007e7e7e7e7e7e00 before entering the loop would be fine as well.

Code: Select all: Bitboard bishopAttacks( int sq, const Bitboard &free ) { Bitboard msk = maskB[sq] Bitboard bb = initB[sq]; Bitboard at = bb; bb &= free; for ( int i = 0; i < 7 /* && bb != 0 */; ++i ) { bb >>= 9; bb *= 0x00050005; // 00000000000001010000000000000101 bb &= msk; at |= bb; bb &= free; } return at; }

If one omits the bb != 0 condition the loop was completely unrolled by the msvc2005 compiler and is only slightly faster than KoggeStone in four directions (16.04 versus 16.44). With bb != 0 condition it took about 4.7ns for almost occupied boards (all pattern with 59 bits set, so almost trapped bishops) versus 25ns for almost
empty boards (all pattern with 5 bits set). KoggeStone is of course
not suited for getting attacks of single pieces, but multiple. See

http://wbforum.vpittlik.org/viewtopic.p ... 949&t=5452
http://wbforum.vpittlik.org/viewtopic.p ... 991&t=5452
for the snoob look here:
http://wbforum.vpittlik.org/viewtopic.p ... 881&t=5436

For the rooks it seems not that simple. Multiplication does not work here. ..

Code: Select all: Bitboard rookAttacks( int sq, const Bitboard &free ) { Bitboard msk = maskR[sq] Bitboard bb = initR[sq]; Bitboard at = bb; bb &= free; for ( int i = 0; i < 7 && bb != 0; ++i ) { bb = (bb>>8) | (bb<<8) | ((bb>>1)&0x7f7f7f7f7f7f7f7f) | ((bb<<1)&0xfefefefefefefefe); bb &= msk; at |= bb; bb &= free; } return at; }

Too slow.

Cheers,
Gerd

by **Michael Sherwin** » 14 Sep 2006, 01:08

Hi Harald,

Nice try!

My original idea for bishops was to find a set of 3 or 4 shifts that would shift all the possible blocker bits to the first 16 bits and use that as an index into a look up table. All shift rule sets so far have at least one of the possible blockers anniallating another. That is why I chose the other way. Gerd and Bob Hyatt say that it will not work as the hash tables would have to be to big. I count only 5284 entries needed for the bishop and 102,400 entries needed for the rook. If a minimal perfect hash can be constructed then the hash tables would not be to big. So far no one has confirmed wether or not these numbers are correct or if it is possible to construct such a hash for them. I am working on a small program that will generate all blocking patterns for all squares and use a random set of hash values to mark table entrys as seen. As soon as an entry is seen twice a new random set of hash values is created the table is zeroed and it starts all over again.

THANKS,
Mike

by **Michael Sherwin** » 14 Sep 2006, 01:37

Michael Sherwin wrote:Hi Harald,

Nice try!

My original idea for bishops was to find a set of 3 or 4 shifts that would shift all the possible blocker bits to the first 16 bits and use that as an index into a look up table. All shift rule sets so far have at least one of the possible blockers anniallating another. That is why I chose the other way. Gerd and Bob Hyatt say that it will not work as the hash tables would have to be to big. I count only 5284 entries needed for the bishop and 102,400 entries needed for the rook. If a minimal perfect hash can be constructed then the hash tables would not be to big. So far no one has confirmed wether or not these numbers are correct or if it is possible to construct such a hash for them. I am working on a small program that will generate all blocking patterns for all squares and use a random set of hash values to mark table entrys as seen. As soon as an entry is seen twice a new random set of hash values is created the table is zeroed and it starts all over again.

THANKS,
Mike

I have just eliminated the need for hash tables!

Code: Select all: moveBits = bishopBits[sq]; blockers.u = moveBits & allPieces; moveBits &= possibleBits[sq][blockers.row2]; moveBits &= possibleBits[sq][blockers.row3]; moveBits &= possibleBits[sq][blockers.row4]; moveBits &= possibleBits[sq][blockers.row5]; moveBits &= possibleBits[sq][blockers.row6]; moveBits &= possibleBits[sq][blockers.row7]; moveBits ^= friendlies;

Mike

by **Tony van Roon-Werten** » 14 Sep 2006, 09:12

Gerd Isenberg wrote:Hi Harald,

since you replied to Michael's post, i'll interprete your "you" as anybody who read this - i like to give my two cents for your very nice idea. It is an ressource friendly, interesting approach, but i think it can't compete with most other approaches mentioned here - except you have most likely trapped bishops.

You'll take the initial attacks by lookup and do what my dumb7fill or Steffan Westcott's Kogge-Stone-Fill do with one direction and one slider, with four directions simultaniously. Your >>= 9, *= 0x5005; &= mask expands or "explodes" by one step in all directions simultaniously. Since you already made the first shifts by lookup, up to six steps.

It is necessary to mask with "free" before the first shift-mul-expansion, The bb &= 0x007f7f7f7f7f7f7f7f is not necessary, since wrapped bits don't share same color. Otherwise, a free &= 0x007e7e7e7e7e7e00 before entering the loop would be fine as well.

Code: Select all
Bitboard bishopAttacks( int sq, const Bitboard &free ) { Bitboard msk = maskB[sq] Bitboard bb = initB[sq]; Bitboard at = bb; bb &= free; for ( int i = 0; i < 7 /* && bb != 0 */; ++i ) { bb >>= 9; bb *= 0x00050005; // 00000000000001010000000000000101 bb &= msk; at |= bb; bb &= free; } return at; }

If one omits the bb != 0 condition the loop was completely unrolled by the msvc2005 compiler and is only slightly faster than KoggeStone in four directions (16.04 versus 16.44). With bb != 0 condition it took about 4.7ns for almost occupied boards (all pattern with 59 bits set, so almost trapped bishops) versus 25ns for almost
empty boards (all pattern with 5 bits set). KoggeStone is of course
not suited for getting attacks of single pieces, but multiple. See

http://wbforum.vpittlik.org/viewtopic.p ... 949&t=5452
http://wbforum.vpittlik.org/viewtopic.p ... 991&t=5452
for the snoob look here:
http://wbforum.vpittlik.org/viewtopic.p ... 881&t=5436

For the rooks it seems not that simple. Multiplication does not work here. ..
Code: Select all
Bitboard rookAttacks( int sq, const Bitboard &free ) { Bitboard msk = maskR[sq] Bitboard bb = initR[sq]; Bitboard at = bb; bb &= free; for ( int i = 0; i < 7 && bb != 0; ++i ) { bb = (bb>>8) | (bb<<8) | ((bb>>1)&0x7f7f7f7f7f7f7f7f) | ((bb<<1)&0xfefefefefefefefe); bb &= msk; at |= bb; bb &= free; } return at; }

Too slow.

Cheers,
Gerd

You might make it a lot faster by using a switch I think.

The idea is that you can use a lookup table for every square to see how often the loop should be executed. For centre fields that would be only 4, edges take 7.

Code: Select all: loopcount=look_up[sq] switch (loopcount) case 7: calculate_once // no break case 6:calculate_once // no break etc

by **Gerd Isenberg** » 14 Sep 2006, 10:55

Tony van Roon-Werten wrote:You might make it a lot faster by using a switch I think.

The idea is that you can use a lookup table for every square to see how often the loop should be executed. For centre fields that would be only 4, edges take 7.

Code: Select all
loopcount=look_up[sq] switch (loopcount) case 7: calculate_once // no break case 6:calculate_once // no break etc

Yes, good point Tony. Actually loopcount needs only 3..6.
I will try it later to see what code the msvc2005 compiler will prodcue for that switch with four consecutive cases. What about K8's predictability of indirect jumps? Iirr same jump-target as last time which is probably fine here...

It is really incredible how many approaches are possible to get sliding attacks sets ;-)

by **Harald Lüßen** » 14 Sep 2006, 17:46

Gerd Isenberg wrote:Hi Harald,

since you replied to Michael's post, i'll interprete your "you" as anybody who read this

Yes. Michaels post was just the inspiration.

- i like to give my two cents for your very nice idea.

Thank you for your comments.

It is an ressource friendly, interesting approach, but i think it can't compete with most other approaches mentioned here - except you have most likely trapped bishops.

You'll take the initial attacks by lookup and do what my dumb7fill or Steffan Westcott's Kogge-Stone-Fill do with one direction and one slider, with four directions simultaniously. Your >>= 9, *= 0x5005; &= mask expands or "explodes" by one step in all directions simultaniously.

Explode was exactly the expression I had in mind.

Since you already made the first shifts by lookup, up to six steps.

It is necessary to mask with "free" before the first shift-mul-expansion, The bb &= 0x007f7f7f7f7f7f7f7f is not necessary, since wrapped bits don't share same color. Otherwise, a free &= 0x007e7e7e7e7e7e00 before entering the loop would be fine as well.

What do you mean with "would be fine"? Optional or must have?

If one omits the bb != 0 condition

I was not sure about that.

the loop was completely unrolled by the msvc2005 compiler and is only slightly faster than KoggeStone in four directions (16.04 versus 16.44). With bb != 0 condition it took about 4.7ns for almost occupied boards (all pattern with 59 bits set, so almost trapped bishops) versus 25ns for almost
empty boards (all pattern with 5 bits set). KoggeStone is of course
not suited for getting attacks of single pieces, but multiple.
[ ... ]
For the rooks it seems not that simple. Multiplication does not work here. ..

I will give it another try.

While I was at work today I thought about the switch approach and found
it also suggested in another post in this thread.

Next try:

Code: Select all: int repsB[64] = { 6, 5, 4, 3, 3, 4, 5, 6, 5, 5, 4, 3, 3, 4, 5, 5, 4, 4, 4, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 4, 4, 4, 5, 5, 4, 3, 3, 4, 5, 5, 6, 5, 4, 3, 3, 4, 5, 6, }; int repsR[64] = { 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 6, 5, 4, 4, 4, 4, 5, 6, 6, 5, 4, 3, 3, 4, 5, 6, 6, 5, 4, 3, 3, 4, 5, 6, 6, 5, 4, 4, 4, 4, 5, 6, 6, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, }; int rankR[64][128]; // or byte or Bitboard? void makeRankR() { for ( int sq = 0; sq < 8; ++sq ) { for ( int i = 0; i < 128; i += 2 ) { int rr = 0; for ( j = sq - 1; j >= 0; --j ) { rr |= (1 << j); if ( i & (1 << j) ) break; } for ( j = sq + 1; j < 8; ++j ) { rr |= (1 << j); if ( i & (1 << j) ) break; } rankR[sq][i ] = rr; rankR[sq][i+1] = rr; } } } // Did you see the copy and paste error in makeInitR() ? Bitboard bishopAttacks( int sq, const Bitboard &free ) { Bitboard msk = maskB[sq]; Bitboard bb = initB[sq]; Bitboard at = bb; bb &= free; switch ( repsB[sq] ) { case 6: bb >>= 9; bb *= 0x00050005; bb &= msk; at |= bb; bb &= free; // if ( bb == 0 ) ... break; // Note bishop trapped in corner. case 5: bb >>= 9; bb *= 0x00050005; bb &= msk; at |= bb; bb &= free; // if ( bb == 0 ) ... break; // Note bishop trapped in corner. case 4: bb >>= 9; bb *= 0x00050005; bb &= msk; at |= bb; bb &= free; case 3: bb >>= 9; bb *= 0x00050005; bb &= msk; at |= bb; bb &= free; bb >>= 9; bb *= 0x00050005; bb &= msk; at |= bb; bb &= free; bb >>= 9; bb *= 0x00050005; bb &= msk; at |= bb; } return at; } Bitboard rookAttacks( int sq, const Bitboard &free ) { Bitboard msk = maskR[sq]; Bitboard bb = initR[sq]; Bitboard at = bb; bb &= free; switch ( repsR[sq] ) { case 6: bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; bb &= free; case 5: bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; bb &= free; case 4: bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; bb &= free; case 3: bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; bb &= free; bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; bb &= free; bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; } if ( sq < 8 ) // or do this before the switch { at |= rankR[sq][free & 0x7e]; } return at; }

Sorry I do not have the test and timing infrastructure.

Harald

by **Harald Lüßen** » 14 Sep 2006, 18:06

Sorry, there was at least one more error. Use this table:

Code: Select all: int repsR[64] = { 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 6, 6, 5, 4, 4, 4, 4, 5, 6, 6, 5, 4, 3, 3, 4, 5, 6, 6, 5, 4, 3, 3, 4, 5, 6, 6, 5, 4, 4, 4, 4, 5, 6, 6, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, };

Harald

by **Andreas Guettinger** » 14 Sep 2006, 18:34

Just wanted to say that I'm very happy with Gerds 9.5kb approach, it's as fast as the full rotated bitboards on my Powerpc G5, and very easy to implement codewise.

Code: Select all: A B C Normal: 21.38 4.97 11.86 Rotated: 15.81 3.59 13.63 Gerd 1.5: 16.23 4.03 10.91 Gerd 9.5: 14.97 3.81 10.72

A) Time to perft 5 of Anima in WAC 30
B) Time to call genmoves+gencaptures 4M times
C) Time to call genmoves+gencaptures+make+undo 4M times

by **Harald Lüßen** » 14 Sep 2006, 20:58

I only find my errors after posting. :-(

The array rankR[] is actally only defined as

Code: Select all: int rankR[8][128];

It could even be int rankR[8][64] with an additional shift.

I confused free bits with occupied bits in this array's index.
Make it either (fast)

Code: Select all: void makeRankR() { for ( int sq = 0; sq < 8; ++sq ) { for ( int i = 0; i < 128; i += 2 ) { int rr = 0; for ( j = sq - 1; j >= 0; --j ) { rr |= (1 << j); if ( !(i & (1 << j)) ) // the 1 bits are the free squares break; } for ( j = sq + 1; j < 8; ++j ) { rr |= (1 << j); if ( !(i & (1 << j)) ) // the 1 bits are the free squares break; } rankR[sq][i ] = rr; rankR[sq][i+1] = rr; } } }

or (slower)

Code: Select all: at |= rankR[sq][~free & 0x7e];

but not both changes, of course.

Harald

by **Gerd Isenberg** » 14 Sep 2006, 21:01

Harald Lüßen wrote:What do you mean with "would be fine"? Optional or must have?

The wrap mask-and is not necessary. Assuming we had to care about wraps of the >>=9 (either h-a or a-h, depending on the mapping), then a leading mask off of the least significant file is appropriate, to avoid the wrap. In this case a leading reset of those file bits in "empty" would be enough.

For bishops we do not care about a-h wraps which may occur in both directions btw. If we shift a set bit by an odd amount along on a rank, the square color change. An even shift amount otoh results in the same square color. But if we wrap around, aka cross the a-h boarder one time, we flip the square color as well. Thus for bishops all wraps result in the opposite square color of the bishop and are therefor guaranteed to get masked off by bb &= maskB[sq] (all bishop attacks obviously share the same square color).

This is not the case for rooks - if you think about it. Since in further explode steps there are possible overflows inside a rank and also wraps which might overflow with a- or h-file bits.

Gerd

by **Gerd Isenberg** » 14 Sep 2006, 21:17

Gerd Isenberg wrote:
Tony van Roon-Werten wrote:You might make it a lot faster by using a switch I think.

The idea is that you can use a lookup table for every square to see how often the loop should be executed. For centre fields that would be only 4, edges take 7.

Code: Select all
loopcount=look_up[sq] switch (loopcount) case 7: calculate_once // no break case 6:calculate_once // no break etc

Yes, good point Tony. Actually loopcount needs only 3..6.
I will try it later to see what code the msvc2005 compiler will prodcue for that switch with four consecutive cases. What about K8's predictability of indirect jumps? Iirr same jump-target as last time which is probably fine here...

It is really incredible how many approaches are possible to get sliding attacks sets

The switch works fine, 11.3 versus former branchless 16 ns in the snoob-test for bishops. 9.5KByte lookup takes 6.8 ns. msvc2005 for x64 generates a good predictable sub(cmp) je-chain.

Gerd

by **Harald Lüßen** » 15 Sep 2006, 06:33

Gerd Isenberg wrote:[ ... ]
This is not the case for rooks - if you think about it. Since in further explode steps there are possible overflows inside a rank and also wraps which might overflow with a- or h-file bits.

Gerd

Thank you for the explanation. I got it now.
My new rookAttacks() function should do the trick.
Note that it uses a shift down now (bb>>=8) and another factor 0x010281.
The wrap around of the left and right expansion now go to the previous or next row and is masked out.
I have to do the first rank different anyway.
Can you please give a timing for it?

Harald

by **Michael Sherwin** » 15 Sep 2006, 17:29

Andreas Guettinger wrote:Just wanted to say that I'm very happy with Gerds 9.5kb approach, it's as fast as the full rotated bitboards on my Powerpc G5, and very easy to implement codewise.

Code: Select all
A B C Normal: 21.38 4.97 11.86 Rotated: 15.81 3.59 13.63 Gerd 1.5: 16.23 4.03 10.91 Gerd 9.5: 14.97 3.81 10.72

A) Time to perft 5 of Anima in WAC 30
B) Time to call genmoves+gencaptures 4M times
C) Time to call genmoves+gencaptures+make+undo 4M times

You should add a test for each method that at the leafs calculates or looks up the mobility for each piece. I think that you will find that rotated bitboards will blow the other methods away. Then consider my original proposal for a perfect hash look up system that not only stores the move bits, but, also the mobility and you will see the potential of my proposal. Then notice that my system is mostly 32 bit accesses with out multiplies or shifts and uses only single indici look ups. It requires more memory, but, has the potential to be much faster.

by **Gerd Isenberg** » 15 Sep 2006, 19:21

Harald Lüßen wrote:
Gerd Isenberg wrote:[ ... ]
This is not the case for rooks - if you think about it. Since in further explode steps there are possible overflows inside a rank and also wraps which might overflow with a- or h-file bits.

Gerd

Thank you for the explanation. I got it now.
My new rookAttacks() function should do the trick.
Note that it uses a shift down now (bb>>=8) and another factor 0x010281.
The wrap around of the left and right expansion now go to the previous or next row and is masked out.
I have to do the first rank different anyway.
Can you please give a timing for it?

Harald

Sorry Harald,
your rookAttacks() routine still has broken results for the edges.
The shift-or approach takes 20 ns with switch versus 5.8 of antirotate 9.5KByte.

Gerd

by **Harald Lüßen** » 15 Sep 2006, 22:10

Sorry Harald,
your rookAttacks() routine still has broken results for the edges.
The shift-or approach takes 20 ns with switch versus 5.8 of antirotate 9.5KByte.

Ah, the north bit on the H file going north-east gets in the mask in the first iteration.
And the south bit on the A file going north-west gets in the mask in the first iteration.
Perhaps that could be fixed:

Code: Select all: Bitboard rookAttacks( int sq, const Bitboard &free ) { Bitboard msk = maskR[sq]; Bitboard bb = initR[sq]; Bitboard at = bb; bb &= free; switch ( repsR[sq] ) { case 7: // In repsR[64] change A1..A8 and H1..H8 from 6 to 7. msk &= ( (sq & 7) ? 0xfefefefefefefefe : 0x7f7f7f7f7f7f7f7f ); case 6: bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; bb &= free; msk = maskR[sq]; case 5: bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; bb &= free; case 4: bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; bb &= free; case 3: bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; bb &= free; bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; bb &= free; bb >>= 8; bb *= 0x00010281; bb &= msk; at |= bb; } if ( sq < 8 ) // or do this before the switch { at |= rankR[sq][free & 0x7e]; } return at; }

But then, the operation would kill more performance and there will
perhaps be another bug.

Ok. It would have been too good for a first attempt in using magic. :-)

Even if it was not that fast and a little bit broken, it was fun to try it
and I learned a lot about bit multiplication and shifting. I even could
understand a little of the taboo-less castling-not-over-checks magic
in CCC. :-)

I think I have to go to study your antirotate 9.5 KByte system then.

If I remember correctly it was something about vertical, horizontal
and two diagonal attacks from a square. It is easy to combine it
to all attacks from the square by or-ing. Is there a description of a
simple method to do attacks in one of the eight directions only?
Is it simply done with an additional dirMask[64][8] array?

Or if I do not implement your system I can try to check and improve
my own rotated bitboard implementation, which I have not touched
since the day it first worked.

And there is another question:
What is faster, bb = (1 << sq) or bb = oneBit[sq] ?
Is there a page in the net where such basics are taught?
I mean basic C/C++ instruction performance, not assembler.

Harald

by **H.G.Muller** » 15 Sep 2006, 22:54

Harald Lüßen wrote:And there is another question:
What is faster, bb = (1 << sq) or bb = oneBit[sq] ?
Is there a page in the net where such basics are taught?
I mean basic C/C++ instruction performance, not assembler.

There is no such thing as basic C instruction performance. A compiler might translate identical C statements different each time, based on the availability of registers to use as tempoary storage, and proximity of identical subexpressions. Especially extra saving to memory might occur.

Otherwise, compilers are pretty smart in using the minimum number of instructions needed to perform a certain operation.

Even predicting assembler performance is pretty hard, with these modern out-of-order machines. Some instructions have a somewhat longer latency, (normally a result can be used on the next clock), but often that latency doesn't affect performance at all: other, independent instructions are executed in parallel.

It pays to use algorithms that do not overload one specific, scarce resource. Although most machines have 2 or 3 integer execute units, usually only one of those has a barrel shifter/multiplier. On the other hand, bandwidth to the cache is usually also a bottleneck, because the compiler adds loads and stores to save temporaries. So if you should prefer an extra load or an extra shift might completely depend on how many loads and shifts you have in the rest of your program.

Winboard Forum

Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Re: Compact Bitboard Attacks

Who is online