Winboard Forum

by **Pradu** » 28 Aug 2006, 20:18

Tord Romstad wrote:Hello Pradu!

Yes, I understand how the technique works. It is quite neat, and it is possible that I will be able to find a way to use it. I just have to somehow eliminate the "magic" part of it first. More specifically, I need to find a simple, straightforward and fast algorithm for computing all the required numbers; something that can be computed in a tiny fraction of a second during program initialisation. I can't use precomputed constant arrays containing numbers found by minutes or hours of computations and brute force search.

It looks like a non-trivial problem, but I hope it will turn out to be solvable. Too much else to do right now, though.

Tord

This is possible, but initialization might take like a minute. Take a look at snooby -- it isn't completely brute force because you can get rid of redundant bits and only look at magics that will most likely make a good hash. Nevertheless, I don't see why you want to compute it at runtime. It is only an array of 64 elements.

by **Chan Rasjid** » 28 Aug 2006, 20:23

Hello,

[edit] : a small edit as I missed out something. The main eval() of Snailchess is disabled as it brings down nps by about 55%. Also the best move is not "best" in anyway as my move-ordering is half-done and has no meaning for almost all the moves.

I tested the magic codes and it happen to have TSCP's board mapping so it is easy for me to slot in different codes to compare 3 bitboard move generations 1) bitwise 2) rotated 3) magic. As I have incremental updates of sliding attacks, the partial updates of a single direction due to the positions of from/to all uses the same bitwise codes. Only when a sliding piece itself moves when a new generation of the attack map is needed would the relevant codes be called. Also whenever the king moves as Snailchess incrementally updates the "king-as-queen" attacks .

My hardware is a P4 1.4 Ghz 128mb ram, win98 and compiles with Visual C++ 4.02, blend of pentium/x86 codes, with "only _inline" option, ie if specified. Snailchess is non-recursive with a stack of structs. It has 2 copies of make(), same copy for for bitwise and magic. The rotated version is basically the same, with incremental codes for 2 rotated bishop BB + 1 rotated file BB. Also when on bitwise/magic, the rotated variables are deleted from the stack and the memory alignment happen to fit nicely in 128 byte. With rotated, which has 3 extra u64, it probably need some bad struct padding which may be a slight disadvantage.

1) magic -
As I don't know much about configuration, the magic codes is used
w/o any changes. A small test is done and it seems the fastest is with the following defines untouched, ie minimize + inline

#define MINIMIZE_MAGIC
#define USE_INLINING

2) rotated-
My rotated lookup is almost identical to Crafty's, the only difference is Crafty seperates the bs-lookup for 2 directions with total memory
2 X 32k bishop + 1 x 32k file + 1 x 32k rank. Mine has a total of 3 x 32k as the bs-lookup are ORed together and masking extracts the proper diretion. My incremental codes for updates of the rotated BB seems to have negligible overhead.

3) bitwise -
Although the codes don't seem to look efficient, it happens to do fairly well "on average". The reason probably is analagous to bitscan forward/ reverse. One of the up/down direction probably is like bitscan forward and looping terminates much faster than the other bad direction.

3 positions are tested with fixed-depth search and the the searches are done in the order as given below:-

pos 1
r4rk1/3b3p/pqn1pbp1/3p1n2/1P3P2/3B4/PPNBQ1PP/R2NKR2 w Q - 0 21

bitwise +6.3%
1 a1b1 ( 94) depth( 7) pvL( 8) maxPly(32) nps(148059) sec(116.27 ) ftime(0) Poll(262)
Probe (9426618) HitGE_D( 8.0% ) HitL23/L1( 4.2% ) HitL2( 10271 ) HitL3( 20082 )
Result 1/2-1/2{Analysis Mode}

rotate 0%
1 a1b1 ( 94) depth( 7) pvL( 8) maxPly(32) nps(139233) sec(123.64 ) ftime(0) Poll(262)
Probe (9426618) HitGE_D( 8.0% ) HitL23/L1( 4.2% ) HitL2( 10271 ) HitL3( 20082 )
Result 1/2-1/2{Analysis Mode}

magic -0.6%
1 a1b1 ( 94) depth( 7) pvL( 8) maxPly(32) nps(138382) sec(124.40 ) ftime(0) Poll(262)
Probe (9426618) HitGE_D( 8.0% ) HitL23/L1( 4.2% ) HitL2( 10271 ) HitL3( 20082 )
Result 1/2-1/2{Analysis Mode}

bitwise +6.6%
1 a1b1 ( 94) depth( 7) pvL( 8) maxPly(32) nps(148403) sec(116.00 ) ftime(0) Poll(262)
Probe (9426618) HitGE_D( 8.0% ) HitL23/L1( 4.2% ) HitL2( 10271 ) HitL3( 20082 )
Result 1/2-1/2{Analysis Mode}

end test1

pos 2
rnb1kbnr/ppp1pppp/2q5/3N4/2pPP3/5N2/PP1B1PPP/R2QKB1R b KQkq e3 0 1

bitwise +1.9%
1 b7b5 ( -70) depth( 8) pvL( 9) maxPly(33) nps(140466) sec(59.76 ) ftime(0) Poll(128)
Probe (4642192) HitGE_D( 10.4% ) HitL23/L1( 4.1% ) HitL2( 8533 ) HitL3( 10459 )
Result 1/2-1/2{Analysis Mode}

rotate 0%
1 b7b5 ( -70) depth( 8) pvL( 9) maxPly(33) nps(137791) sec(60.92 ) ftime(0) Poll(128)
Probe (4642192) HitGE_D( 10.4% ) HitL23/L1( 4.1% ) HitL2( 8533 ) HitL3( 10459 )
Result 1/2-1/2{Analysis Mode}

magic +3.3%
1 b7b5 ( -70) depth( 8) pvL( 9) maxPly(33) nps(142299) sec(58.99 ) ftime(0) Poll(128)
Probe (4642192) HitGE_D( 10.4% ) HitL23/L1( 4.1% ) HitL2( 8533 ) HitL3( 10459 )
Result 1/2-1/2{Analysis Mode}

bitwise +2.7%
1 b7b5 ( -70) depth( 8) pvL( 9) maxPly(33) nps(141508) sec(59.32 ) ftime(0) Poll(128)
Probe (4642192) HitGE_D( 10.4% ) HitL23/L1( 4.1% ) HitL2( 8533 ) HitL3( 10459 )
Result 1/2-1/2{Analysis Mode}

end test2

pos 3
1r4k1/5ppp/p7/P2pp2q/1r4n1/N4QBP/2P3P1/R4K1R b - - 0 28

bitwise +1.4%
1 b4e4 ( -115) depth( 8) pvL(10) maxPly(31) nps(151595) sec(39.27 ) ftime(0) Poll(90)
Probe (3385317) HitGE_D( 15.4% ) HitL23/L1( 4.3% ) HitL2( 8179 ) HitL3( 13427 )
Result 1/2-1/2{Analysis Mode}

rotate 0%
1 b4e4 ( -115) depth( 8) pvL(10) maxPly(31) nps(149464) sec(39.83 ) ftime(0) Poll(90)
Probe (3385317) HitGE_D( 15.4% ) HitL23/L1( 4.3% ) HitL2( 8179 ) HitL3( 13427 )
Result 1/2-1/2{Analysis Mode}

magic +2.4%
1 b4e4 ( -115) depth( 8) pvL(10) maxPly(31) nps(153076) sec(38.89 ) ftime(0) Poll(90)
Probe (3385317) HitGE_D( 15.4% ) HitL23/L1( 4.3% ) HitL2( 8179 ) HitL3( 13427 )
Result 1/2-1/2{Analysis Mode}

bitwise +1.4%
1 b4e4 ( -115) depth( 8) pvL(10) maxPly(31) nps(151595) sec(39.27 ) ftime(0) Poll(90)
Probe (3385317) HitGE_D( 15.4% ) HitL23/L1( 4.3% ) HitL2( 8179 ) HitL3( 13427 )
Result 1/2-1/2{Analysis Mode}

end test 3

As magic seems to fair badly in pos1, a repeat is done for pos1 again and showed test 1 ok
The probable reason is that the best move happen to be a rook move which is bad for magic
pos 1

bitwise
1 a1b1 ( 94) depth( 7) pvL( 8) maxPly(32) nps(148749) sec(115.73 ) ftime(0) Poll(262)
Probe (9426618) HitGE_D( 8.0% ) HitL23/L1( 4.2% ) HitL2( 10271 ) HitL3( 20082 )
Result 1/2-1/2{Analysis Mode}

magic
1 a1b1 ( 94) depth( 7) pvL( 8) maxPly(32) nps(137586) sec(125.12 ) ftime(0) Poll(262)
Probe (9426618) HitGE_D( 8.0% ) HitL23/L1( 4.2% ) HitL2( 10271 ) HitL3( 20082 )
Result 1/2-1/2{Analysis Mode}

Best Regards
Rasjid

magic codes:-

Code: Select all: void getBAttackBB_SQ64(brMove_t * pMS, const u64 all){ static const u64 diagonal[2][16] = { 0x0000000000000001, 0x0000000000000102, 0x0000000000010204, 0x0000000001020408, 0x0000000102040810, 0x0000010204081020, 0x0001020408102040, 0x0102040810204080, 0x0204081020408000, 0x0408102040800000, 0x0810204080000000, 0x1020408000000000, 0x2040800000000000, 0x4080000000000000, 0x8000000000000000, 0000000000000000, 0x0000000000000080, 0x0000000000008040, 0x0000000000804020, 0x0000000080402010, 0x0000008040201008, 0x0000804020100804, 0x0080402010080402, 0x8040201008040201, 0x4020100804020100, 0x2010080402010000, 0x1008040201000000, 0x0804020100000000, 0x0402010000000000, 0x0201000000000000, 0x0100000000000000, 0000000000000000 }; //work with local dir[2], better then pMS->bb[] const u64 dir[2] = { diagonal[0][RANK64(pMS->from) + FILE64(pMS->from)], diagonal[1][7 + RANK64(pMS->from) - FILE64(pMS->from)] }; u64 u = Bmagic(pMS->from, all) | (dir[0] & dir[1]); pMS->bb[0] = ( pMS->dir[0] = dir[0]) & u; pMS->bb[1] = ( pMS->dir[1] = dir[1]) & u; pMS->bal = pMS->bb[0] ^ pMS->bb[1]; assert( (pMS->bb[0] | pMS->bb[1]) == getBAttackBB_Shift(pMS->bb[0] & pMS->bb[1], all)); return; }

bitwise

Code: Select all: void getBAttackBB_SQ64(brMove_t * pMS, const u64 all){ static const u64 diagonal[2][16] = { 0x0000000000000001, 0x0000000000000102, 0x0000000000010204, 0x0000000001020408, 0x0000000102040810, 0x0000010204081020, 0x0001020408102040, 0x0102040810204080, 0x0204081020408000, 0x0408102040800000, 0x0810204080000000, 0x1020408000000000, 0x2040800000000000, 0x4080000000000000, 0x8000000000000000, 0000000000000000, 0x0000000000000080, 0x0000000000008040, 0x0000000000804020, 0x0000000080402010, 0x0000008040201008, 0x0000804020100804, 0x0080402010080402, 0x8040201008040201, 0x4020100804020100, 0x2010080402010000, 0x1008040201000000, 0x0804020100000000, 0x0402010000000000, 0x0201000000000000, 0x0100000000000000, 0000000000000000 }; //work with local dir[2], better then pMS->bb[] const u64 dir[2] = { diagonal[0][RANK64(pMS->from) + FILE64(pMS->from)], diagonal[1][7 + RANK64(pMS->from) - FILE64(pMS->from)] }; u64 pos = dir[0] & dir[1]; assert(pos & all); for (int ii = 2; ii--; ){ u64 lb = 1; assert(pos & dir[ii]); u64 i; if ( !(i = dir[ii] & all & BITS_L(pos)) ); else{ while ( i & i - 1){ i &= i - 1; } lb = i & 0-i; } i = dir[ii] & all & BITS_G(pos); pMS->bb[ii] = ( pMS->dir[ii] = dir[ii]) & (((i &= 0 - i) - lb) | i); } pMS->bal = pMS->bb[0] ^ pMS->bb[1]; assert( (pMS->bb[0] | pMS->bb[1]) == getBMapSq64(pMS->from, all)); return; }

rotated:

Code: Select all: void getBMapRtd(const board_t * board, stack_t * pS, brMove_t * pElem){ static const u64 diagonal[2][16] = { 0x0000000000000001, 0x0000000000000102, 0x0000000000010204, 0x0000000001020408, 0x0000000102040810, 0x0000010204081020, 0x0001020408102040, 0x0102040810204080, 0x0204081020408000, 0x0408102040800000, 0x0810204080000000, 0x1020408000000000, 0x2040800000000000, 0x4080000000000000, 0x8000000000000000, 0000000000000000, 0x0000000000000080, 0x0000000000008040, 0x0000000000804020, 0x0000000080402010, 0x0000008040201008, 0x0000804020100804, 0x0080402010080402, 0x8040201008040201, 0x4020100804020100, 0x2010080402010000, 0x1008040201000000, 0x0804020100000000, 0x0402010000000000, 0x0201000000000000, 0x0100000000000000, 0000000000000000 }; pElem->dir[0] = diagonal[0][RANK64(pElem->from) + FILE64(pElem->from)]; pElem->dir[1] = diagonal[1][7 + RANK64(pElem->from) - FILE64(pElem->from)]; pElem->bb[0] = lookupRtdDiag( pS, pElem->from, 0) & pElem->dir[0]; pElem->bb[1] = lookupRtdDiag( pS, pElem->from, 1) & pElem->dir[1]; pElem->bal = pElem->bb[0] ^ pElem->bb[1]; return; } #define lookupRtdDiag(pS, pos, dir) (rtdDiagLookup[(pos)] \ [(int)((pS)->stack->bbRtdDiag[(dir)] >> rtdDiagShift[(dir)][(pos)]) & 63]) #define lookupFile(pS, pos) (ftrLookup[(pos)] \ [(int)((pS)->stack->bbFTR >> rtdFileShift[(pos)]) & 63]) #define lookupRank(pS, pos) (rankLookup[(pos)] \ [(int)((pS)->all >> rankShift[(pos)]) & 63])

by **Tord Romstad** » 28 Aug 2006, 21:11

Pradu wrote:
Tord Romstad wrote:Hello Pradu!

Yes, I understand how the technique works. It is quite neat, and it is possible that I will be able to find a way to use it. I just have to somehow eliminate the "magic" part of it first. More specifically, I need to find a simple, straightforward and fast algorithm for computing all the required numbers; something that can be computed in a tiny fraction of a second during program initialisation. I can't use precomputed constant arrays containing numbers found by minutes or hours of computations and brute force search.

It looks like a non-trivial problem, but I hope it will turn out to be solvable. Too much else to do right now, though.

Tord

This is possible, but initialization might take like a minute.

That's rather too much, I'm afraid. I imagine that most users would be annoyed if they had to wait a minute for my program to initialise each time they start it. :wink:

Take a look at snooby -- it isn't completely brute force because you can get rid of redundant bits and only look at magics that will most likely make a good hash. Nevertheless, I don't see why you want to compute it at runtime. It is only an array of 64 elements.

I want to compute it at runtime because I don't like magic. I don't want my program to contain a list of numeric constants (even if there are only 64 of them) which I can't explain in better terms than "they do what I want, and I found them by a brute-force search". If I can derive the same set of numbers in a simple and logical way from basic principles, I am much happier.

Whenever possible, I prefer my code to consist entirely of simple and logical sequences of instructions, without any magic or clever tricks thrown in. Multiplication by some bizzarre number which just happens to magically give me what I want is exactly the type of thing I want to avoid.

Tord

by **Uri Blass** » 28 Aug 2006, 21:28

Tord Romstad wrote:
<snipped>
Whenever possible, I prefer my code to consist entirely of simple and logical sequences of instructions, without any magic or clever tricks thrown in.
Tord

It is your decision but in this case if you want to be consistent you also need not to use some magic file that help glaurung to evaluate kpk positions with no mistake and
kpk.bin is not calculated by glaurung at run time

Uri

by **Tord Romstad** » 28 Aug 2006, 21:43

Uri Blass wrote:
Tord Romstad wrote:
<snipped>
Whenever possible, I prefer my code to consist entirely of simple and logical sequences of instructions, without any magic or clever tricks thrown in.
Tord

It is your decision but in this case if you want to be consistent you also need not to use some magic file that help glaurung to evaluate kpk positions with no mistake and
kpk.bin is not calculated by glaurung at run time

I agree. Generating the KP vs K bitbase at runtime has been on my todo list for quite a long time. What to do with bigger bitbases (if and when I decide to implement them) is a more difficult problem. Most bitbases with more than 3 pieces are probably more expensive to compute at run-time.

I find it less painful to use "magical" techniques for complex tasks like giving exact evaluations of basic endgames than for basic operations like generating moves, though.

Tord

by **H.G.Muller** » 28 Aug 2006, 22:36

On a mediocre machine (1.6 MHz Pentium M) a 5-piece tablebase can be computed in less than 300 sec. 4-piece EGTBs take just a few seconds. That is for end-games with all Pieces. (No other EGTBs needed.)

To generate a complete EGTB with Pawns from scratch takes longer, because you have to deal with the promotions. Unlike with Piece-only, these can convert to more complicated end-games. On the other hand, you don't need the full EGTB, since you know what Pawn configurations are unreachable from the game position.

by **Zach Wegner** » 28 Aug 2006, 22:49

Tord Romstad wrote:Yes, non-rotated approaches certainly have some rather enticing advantages. I'm using rotated bitboards now because they make it relatively easy to write compact and generic code, but they can often be frustratingly slow (at least on my 32-bit CPU). I should probably have a closer look at all the new non-rotated techniqes and see if I can find something I can use. The problem is that I want to use exactly the same code for each sliding direction, that I prefer not to use "magic" in any form, and that I don't want to have a single line of somebody else's code in my program.

Tord

Hello Tord,

There are ways to do this with non-rotated. My code is specific for each direction, but this is just because there is one indexing function for files and one for the other directions. However, it is unnecessary:

Code: Select all: BITBOARD attack_bb[64][4][64]; BITBOARD attack_mask[64][4]; unsigned int horizontal_attack_index(BITBOARD occ) { unsigned int u = occ | occ >> 32; return u * 0x02020202 >> 26; } unsigned int vertical_attack_index(BITBOARD occ) { unsigned int u = occ | occ >> 29; return u * 0x01041041 >> 26; }

As you can see, these could easily be turned into a generic function with shift[4] and multiply[4], or even shift[2] and multiply[2].
You could even do this with no speed penalty, by passing the dir as a const from a macro, ie

Code: Select all: #define diagonal_attack_bb(sq, occ) attack_bb[sq][DIAG][attack_index(occ & attack_mask[sq][DIAG], DIAG)]

Whether this is too magical for you, I don't know, but it is fast and 32-bit friendly.

Zach

by **Pradu** » 29 Aug 2006, 00:11

Zach Wegner wrote:Whether this is too magical for you, I don't know, but it is fast and 32-bit friendly.
Zach

Zach, can you give a comparison of how fast this approach is compared to magic move bitboard generation in 32-bits and 64-bits?

by **Zach Wegner** » 29 Aug 2006, 02:06

Hello Pradu,

Pradu wrote:Zach, can you give a comparison of how fast this approach is compared to magic move bitboard generation in 32-bits and 64-bits?

I only have a 32 bit computer, but after testing using Gerd's routine (but with just 4 bits because my computer is slow), it seems that magic is significantly faster:

Code: Select all: Mine: time1 in sec = 6.200 runs 635376*64 ~15.247 ns MINIMIZE_MAGIC: time2 in sec = 4.710 runs 635376*64 ~11.583 ns PERFECT_HASH: time2 in sec = 3.350 runs 635376*64 ~ 8.238 ns neither: time2 in sec = 3.700 runs 635376*64 ~ 9.099 ns

However, this is just a test loop. The snoob probably gives a lot of cache hits for the magic. Also separated directions can be convenient sometimes, for example Gerd's serialized "bitscan" idea at http://www.stmintz.com/ccc/index.php?id=487844

Zach

by **Zach Wegner** » 29 Aug 2006, 06:24

Correction: Since I wasn't using a makefile, I forgot to turn optimization on. It turns out there isn't much of a speed difference. Here are the results for 5 bits, using gcc -O2:

Code: Select all: mine: time1 in sec = 14.400 runs 7624512*64 ~ 2.951 ns MINIMIZE_MAGIC: time2 in sec = 14.190 runs 7624512*64 ~ 2.908 ns PERFECT_HASH: time1 in sec = 14.440 runs 7624512*64 ~ 2.959 ns neither: time2 in sec = 18.800 runs 7624512*64 ~ 3.853 ns

by **Gerd Isenberg** » 29 Aug 2006, 10:56

Zach Wegner wrote:Hello Tord,

There are ways to do this with non-rotated. My code is specific for each direction, but this is just because there is one indexing function for files and one for the other directions. However, it is unnecessary:
Code: Select all
BITBOARD attack_bb[64][4][64]; BITBOARD attack_mask[64][4]; unsigned int horizontal_attack_index(BITBOARD occ) { unsigned int u = occ | occ >> 32; return u * 0x02020202 >> 26; } unsigned int vertical_attack_index(BITBOARD occ) { unsigned int u = occ | occ >> 29; return u * 0x01041041 >> 26; }

As you can see, these could easily be turned into a generic function with shift[4] and multiply[4], or even shift[2] and multiply[2].
You could even do this with no speed penalty, by passing the dir as a const from a macro, ie
Code: Select all
#define diagonal_attack_bb(sq, occ) attack_bb[sq][DIAG][attack_index(occ & attack_mask[sq][DIAG], DIAG)]

Whether this is too magical for you, I don't know, but it is fast and 32-bit friendly.
Zach

Hi Zach,

Your >> 29 trick for all masked files is really neat! A 32-bit mode improvement may avoid the odd 64-bit shift, an "expensive" __aullshr call. Since >>32 is a noop in 32-bit mode, i suggest to use (u32)(occ >> 32) << 3 instead - and to eventually make generics with a dummy << 0 for horizontal_attack_index:

Code: Select all: u32 horizontal_attack_index(u64 occ) { u32 u = (u32)occ | (u32)(occ >> 32) << 0; return u * 0x02020202 >> 26; } u32 vertical_attack_index(u64 occ) { u32 u = (u32)occ | (u32)(occ >> 32) << 3; return u * 0x01041041 >> 26; }

The old msc6 compiler tends to replace mul by a shift/add sequence. Of course on source level we can also use shift/or sequence with parallel prefix stuff - similar to Steffan Westcott's original collapsed_xxxx_index routines to get the 8-bit occupied states, which is probably superiour for P4.

Cheers,
Gerd

by **Tord Romstad** » 31 Aug 2006, 14:30

Zach Wegner wrote:Hello Tord,

There are ways to do this with non-rotated. My code is specific for each direction, but this is just because there is one indexing function for files and one for the other directions. However, it is unnecessary:
Code: Select all
BITBOARD attack_bb[64][4][64]; BITBOARD attack_mask[64][4]; unsigned int horizontal_attack_index(BITBOARD occ) { unsigned int u = occ | occ >> 32; return u * 0x02020202 >> 26; } unsigned int vertical_attack_index(BITBOARD occ) { unsigned int u = occ | occ >> 29; return u * 0x01041041 >> 26; }

As you can see, these could easily be turned into a generic function with shift[4] and multiply[4], or even shift[2] and multiply[2].
You could even do this with no speed penalty, by passing the dir as a const from a macro, ie
Code: Select all
#define diagonal_attack_bb(sq, occ) attack_bb[sq][DIAG][attack_index(occ & attack_mask[sq][DIAG], DIAG)]

Whether this is too magical for you, I don't know, but it is fast and 32-bit friendly.

It looks quite nice. Thanks a lot for the suggestion! I may end up using something like this (but for the moment, I am back to my old mailbox board representation, which is about 25% faster than rotated bitboards in my new program).

Tord

by **smcracraft** » 30 Oct 2006, 02:43

Pradu wrote:Downoad the magic move bitboard generator here:

www.prism.gatech.edu/~gtg365v/Buzz

Because I don't have the necessary hardware (I only have a P4), I would appriciate it if someone could test how fast it is on 32/64 bit processors and comparisions in speed to other move bitboard generators. The code is made in such way that it can be integrated into any existing bitboard chess program very easily.

Pradu -how would I integrate this? What routines to call for
initialization at program startup? What routines to call for
bishop and rook?

Does it matter that my orientation is A1=0 to H8=63?

Stuart

by **smcracraft** » 30 Oct 2006, 06:07

Pradu wrote:Downoad the magic move bitboard generator here:

www.prism.gatech.edu/~gtg365v/Buzz

Because I don't have the necessary hardware (I only have a P4), I would appriciate it if someone could test how fast it is on 32/64 bit processors and comparisions in speed to other move bitboard generators. The code is made in such way that it can be integrated into any existing bitboard chess program very easily.

Okay I integrated

#include "magicmoves.h"

in my genmoves.c, call

initmagicmoves()

in my init.c, and use

Bmagic(square,occupancy)

and

Rmagic(square,occupancy)

instead of my bitbaord generators
for those pieces --- and both together
instead of my current queen generator.

All of the above gave me this for the legal moves
in the opening position:

Okay, I integrated the call to initmagicmoves() in my
program startup and then called Bmagic(square,occupancy)
for Bishop (and Queen) and RMagic(square,occupancy)
for Rook (and Queen).

It gave me these moves for the starting position:

! lm
i saw: lm
White Moves

b1a3 b1c3 g1f3 g1h3 c1e7 c1g7 f1b7 f1d7 a1h7 a1g8 h1a7 h1b8 d1d7 d1e7 d1f7 d1d8 d1f8 a2a3 b2b3 c2c3 d2d3 e2e3 f2f3 g2g3\
h2h3 a2a4 b2b4 c2c4 d2d4 e2e4 f2f4 g2g4 h2h4

Black Moves

b8a6 b8c6 g8f6 g8h6 c8e2 c8g2 f8b2 f8d2 a8g1 a8h2 h8b1 h8a2 d8d1 d8f1 d8d2 d8e2 d8f2 a7a6 b7b6 c7c6 d7d6 e7e6 f7f6 g7g6\
h7h6 a7a5 b7b5 c7c5 d7d5 e7e5 f7f5 g7g5 h7h5
!

So it is moving through pieces and all over the board.

My board layout is a1=0...h8=63.

What am I doing wrong?

Stuart

by **Pradu** » 30 Oct 2006, 09:31

Already replied to your second post at CCC forum.

by **smcracraft** » 01 Nov 2006, 04:04

Pradu wrote:Already replied to your second post at CCC forum.

pradu magic is working fine in my program.

I am exploring its speed in relation to my previous program's
version.

I am not seeing the large speedup I expected.

If I remove my present B/R/Q entirely, I get a 4x speedup.

I don't know what is realistic to expect of magic, but more testing
should help.

Thankyou very much for providing your source.

If you wish to look at my integration, I am glad to release it.

by **bob** » 09 Nov 2006, 18:27

Some data from Crafty...

I have been playing around with this stuff since you, Gerd, etc started discussing it. I have taken bits and pieces posted here, and slowly found the time to migrate this approach into Crafty. I completely removed the rotated bitboard stuff, including the updates to the rotated boards in make/unmake, to make this a "real magic implementation."

Here's the results:

log.001: time=1:48 mat=0 n=93585150 fh=90% nps=866K
log.002: time=1:47 mat=0 n=93585150 fh=90% nps=869K

The second line is the magic version. Which appears to be about 1% faster overall, not very significant. I will add that I did not try any of the "smaller" implementations, this is the full-sized magic table approach with 64 x 512 entries for bishops and 64 x 4096 entries for rooks...

As I had thought, the extra size of the table seems to hurt my 32 bit version (running on my usual xeon). I then tested on our 64 bit xeon boxes to see what happens when the 64 bit multiply becomes less expensive. Here's the result there:

log.001: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M
log.002: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M

So on the 64 bit processors, the advantage I thought might show up is not there at all and there appears to be no speed advantage whatsoever.

I have not tried the reduced-table approaches although I guess they would be easy enough to merge in. Does anyone have any data suggesting that the reduced-array versions performs any better? Otherwise the advantage over rotated bitboards seems to be pretty well zero...

by **Zach Wegner** » 09 Nov 2006, 18:50

Hello Bob,

Nice to see you are checking out the magic techniques! They are quite fun. I posted earlier in the thread some results for my move generator versus magic, repeated here:

Code: Select all: mine: time1 in sec = 14.400 runs 7624512*64 ~ 2.951 ns MINIMIZE_MAGIC: time2 in sec = 14.190 runs 7624512*64 ~ 2.908 ns PERFECT_HASH: time1 in sec = 14.440 runs 7624512*64 ~ 2.959 ns neither: time2 in sec = 18.800 runs 7624512*64 ~ 3.853 ns

This is not quite a fair test, as it is a pure move generator loop with a "snoob" to determine the bitboards. But all results have shown pretty clearly that using the "pure" magic is slower than the compressed. Note that these tests were on a G4, which is 32 bit.

You might want to try out some of the anti-rotated approaches found at http://wbforum.vpittlik.org/viewtopic.php?t=4523. I have not tested them head-to-head with the magic approach, but by perft anti-rotated is faster than my old rotated.

by **bob** » 09 Nov 2006, 19:02

I thought the idea looked attractive, as it would eliminate some overhead in updating the rotated bitboards. Unfortunately it didn't eliminate enough to offset the cost... I assume your other "names" mean something I overlooked (minimize-magic, etc) which actually looked a bit better than (I guess) the full-sized table version??

I'm continuing to look, but at least I know my implementation is working since the node counts match exactly on every test, so the generation stuff is clean. I'll look at the anti-rotated stuff (I did not follow that thread at all, but now have some time to get caught back up I hope...)

by **Gerd Isenberg** » 09 Nov 2006, 19:33

bob wrote:Some data from Crafty...

I have been playing around with this stuff since you, Gerd, etc started discussing it. I have taken bits and pieces posted here, and slowly found the time to migrate this approach into Crafty. I completely removed the rotated bitboard stuff, including the updates to the rotated boards in make/unmake, to make this a "real magic implementation."

Here's the results:

log.001: time=1:48 mat=0 n=93585150 fh=90% nps=866K
log.002: time=1:47 mat=0 n=93585150 fh=90% nps=869K

The second line is the magic version. Which appears to be about 1% faster overall, not very significant. I will add that I did not try any of the "smaller" implementations, this is the full-sized magic table approach with 64 x 512 entries for bishops and 64 x 4096 entries for rooks...

As I had thought, the extra size of the table seems to hurt my 32 bit version (running on my usual xeon). I then tested on our 64 bit xeon boxes to see what happens when the 64 bit multiply becomes less expensive. Here's the result there:

log.001: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M
log.002: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M

So on the 64 bit processors, the advantage I thought might show up is not there at all and there appears to be no speed advantage whatsoever.

I have not tried the reduced-table approaches although I guess they would be easy enough to merge in. Does anyone have any data suggesting that the reduced-array versions performs any better? Otherwise the advantage over rotated bitboards seems to be pretty well zero...

The full sized table approach was already suspected not to be the fastest. Pradu's reduced MINIMIZE_MAGIC approach with Java like arrays with 800K for rooks and 41K for bishops seems favorable, see:
http://wbforum.vpittlik.org/viewtopic.p ... 991&t=5452

If you are in the mood, you may also try the 9,5K kindergarten approach as a reference, for instance to get straight attacks in conjunction with Pradu's Bishop attacks, so < 50K in total:
http://wbforum.vpittlik.org/viewtopic.p ... 400&t=4523

Winboard Forum

Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Re: Fastest Magic Move Bitboard Generator ready to use

Who is online