Winboard Forum

by **Zach Wegner** » 10 May 2008, 06:26

Zach Wegner wrote:I'm pretty sure the theoretical max for rooks is six. You can have unconditional squares overlap, or conditional squares overlap, but they can't overlap each other (Well, I guess they could, but each time they do requires an extra bit in the index).

I just have to keep contradicting myself. Eight rooks arranged across a diagonal don't intersect conditional with unconditional. However, finding magics that would map that correctly would be quite complicated. If you could do that, and had 8 different "index spaces", the theoretical minimum lookup size for rooks goes to 8*2^12*8 bytes = 2^18 = 256K. And that could go even lower if you can get some between 2^11 and 2^12.

by **Zach Wegner** » 10 May 2008, 06:45

OK, one more time. I'm almost positive now that mapping more than 2 rooks to one bitboard is impossible. Check it out:

[diag]R7/1R6/2R5/8/8/8/8/8 w - -[/diag]
Now take the A8 rook. Whether or not it attacks A6 or C8 depends on whats in the A7 and B8 squares. But in the master bitboard, whether those squares are set or not must also depend on C7 and B6, because of the C6 rook. So unless you added bits for those squares (which you obviously don't), it won't work.

However, this discussion makes me want to try this same compression in my direction-wise attack sets. 8x less memory for an extra lookup and AND seems not so bad. You can get another 8x too by storing a byte for the attacks, but then you have kindergarten bitboards.

by **Lasse Hansen** » 10 May 2008, 09:23

Yes, I used the magics and shifts from Pradu's post 3 in this thread.

Hm, seems like I was a bit sloppy when making the rook indices (I did it manually). It could be a bit better by applying this pattern by repacing with necessary address space:

Code: Select all: Idx[64]={ 00,01,02,03,04,05,06,07, 01,00,03,02,05,04,07,06, 08,09,10,11,12,13,14,15, 09,08,11,10,13,12,15,14, 16,17,18,19,20,21,22,23, 17,16,19,18,21,20,23,22, 24,25,26,27,28,29,30,31, 25,24,27,26,29,28,31,30, };

Regards, Lasse

by **Lasse Hansen** » 10 May 2008, 09:46

Just brainstorming a bit. One might squeese mobility into this. Needing 6 bits one can try:

Code: Select all: ...xx.oo ....xx.. ...xx.oo ....xx.. ...xx.oo xxxxxRxx xxxxRxxx xxxxRxxx xxxRxxxx ....xx.. ...xx.oo oo..xxoo ...xx.oo oo..xxoo ...xx.oo oo..xxoo Attack = second board; Mobility = (Attack>>5)&7 + ((Attack>>(8+5))&7)<<3; // as is Mobility = (Attack>>IdxMobLow[sqr])&7 + ((Attack>>IdxMobHigh[sqr])&7)<<3; // using tables

using another index table for mobility. One will always have 2x6 bits available in the rook's attack set.

edit: the first attack board will never happen if one wants to match 2x32 rook squares (see previous post), so one are guaranteed 4x4 bits to be used for other purposes.

Regards, Lasse

by **Onno Garms** » 10 May 2008, 11:13

Great idea Lasse.

But what is the reason for not matching two of the rook squares, namely those with values 32768 and 65536 (a3 and b8?)? At first glance, the obvious matching that matches all squares with the same-colored square in the same 2x2 block seems to be slightly better (and even the best).

I think the maximum is to match two squares. Two squares can be matched iff they are diagonally adjacent. Every matching safes a specific amount of memory, namely 2^(64-shift), where shift is the larger of the two shifts of the squares. We are looking for a maximum weight matching.

In terms of matchings, your solution seems to be neither maximum weight nor maximum cardinality.

Onno

by **Lasse Hansen** » 10 May 2008, 11:35

Hi Onno, like I said in a previous post, I was sloppy

. Another possible improvement is to experiment with where to put (the 32) indices, f.ex. by running a set of epd's to get an optimal cache-usage in real positions, by optimising for the highest nps.

Regards, Lasse

by **Onno Garms** » 10 May 2008, 12:53

Lasse Hansen wrote:Another possible improvement is to experiment with where to put (the 32) indices, f.ex. by running a set of epd's to get an optimal cache-usage in real positions, by optimising for the highest nps.

That would be a lot of work and results would be processor dependent. I wouldn't waste my time with that.

Btw. You currently have 33 indices due to the unmatched squares, or am I overlooking something?

by **Aleks Peshkov** » 10 May 2008, 14:36

I thought about post masking before I understand how does infamous Magic Bitboards work... IMHO whether 500k or 800k table is not significant.

Magic Bitboards are fast because they can be inlined everywhere. Increasing code size of all snippets will lead to negative influence to total performance.

by **Lasse Hansen** » 10 May 2008, 19:03

To Onno: I dont agree, if you know how to implement the 'Simulated Annealing'
optimisation algorithm, it would be an easy task (my words, that is). That would be to shuffle
31 different squares for the alread found indices/shifts, and just wait for the result
to be good enough. An old version of Matlab (4.x???) had this as an example with the
traveling salesman problem. It's a pity they abandoned it.

And yes, in my first post to this thread, I had 33 indices (manual approach, but it worked,
and I was happy to post because I felt like inventing somthing new after having tested it),
but I (hope I) improved that in post 23 (and further) in this thread.

And, eh,.. what is _not_ prosessor independant? The invention by itself should be more
than enough to post here, in my opinion. Actually, for me, Gerd's 1K5 approach is a bit slower
~(66knps/82knps) Edit: ~(66Mnps/82Mnps) /edit at higher plies using quasi-legal moves (perft) in my approach.
But,... at lower plies it is actually _faster_ than all my other methods of move
generation (uh, only bitboards). My platform is still AMD 3k+ at 2kMHz, 512k L2 cahce,
1G RAM. So I use Gerd's method in my main engine... But then again, newer processors
have larger (L1?)/L2/L3 caches++, so things might be different on those.

Cheers, Lasse

by **Onno Garms** » 10 May 2008, 19:46

Hi Lasse, you might have missunderstood me. I really think that your approach is great. Definitely a reason to post. I've put it on my list of things to try in my engine.

It's just the shuffling of indices for cache optimisation that has a bad ratio of efford to output in my opinion.

Automatic optimization of the index order is also a good idea. I didn't think of that. But beware, though the basic idea of Simulated Annealing is simple, you will need some time to adjust the parameters to get practically usable results.

Btw, in my engine on my CPU I found that rook magics are faster then Kindergarden bitboards in test frameworks but slower in nps, both by a considerable margin.

@Alex: Though you might be right to assume that 500k vs 800k is not an important difference, we might continue to work with Lasse's ideas to get things down further. For example by matching vertically adjacent squares (better, because we can match more 11-bit-squares to each other) or even four adjacent squares (maybe easiest in the corners).

For example
- choose the a1 multiplier to map b1-g1 to bits 58-63 and a2-a7 to bits 52-57.
- choose the a2 multiplier to map b2-g2 bits 58-63 and map a2-a7 to bits 52-57. (a2 contains a zero here).
What I call side effects (e.g. a3->53 also maps a4->61) is the same for both multiplications, so that should not harm. So a2 can use the table of a1.

Disclaimer: This refers to my numbering a1=0, b1=1, ... I have not tried above improvement, so it might be faulty. But I think it will work.

by **Pradu** » 10 May 2008, 22:14

Onno Garms wrote:Btw, in my engine on my CPU I found that rook magics are faster then Kindergarden bitboards in test frameworks but slower in nps, both by a considerable margin.

Can you provide the details of your test if you still have them?

by **Zach Wegner** » 11 May 2008, 07:14

Onno Garms wrote:@Alex: Though you might be right to assume that 500k vs 800k is not an important difference, we might continue to work with Lasse's ideas to get things down further. For example by matching vertically adjacent squares (better, because we can match more 11-bit-squares to each other) or even four adjacent squares (maybe easiest in the corners).

For example
- choose the a1 multiplier to map b1-g1 to bits 58-63 and a2-a7 to bits 52-57.
- choose the a2 multiplier to map b2-g2 bits 58-63 and map a2-a7 to bits 52-57. (a2 contains a zero here).
What I call side effects (e.g. a3->53 also maps a4->61) is the same for both multiplications, so that should not harm. So a2 can use the table of a1.

Very good idea. I was thinking it would be impossible because you would need extra bits, but when you're putting rooks next to the corner you get a bit or two for free. Unfortunately I think it would only be possible for the corners to get more than 2 rooks without increasing the index size. (You are encouraged to prove me wrong though

)

by **Gerd Isenberg** » 11 May 2008, 11:26

Zach Wegner wrote:Very good idea. I was thinking it would be impossible because you would need extra bits, but when you're putting rooks next to the corner you get a bit or two for free. Unfortunately I think it would only be possible for the corners to get more than 2 rooks without increasing the index size. (You are encouraged to prove me wrong though )

I also have the feeling, that there is more to gain by using Lasse's postmask idea. In general for all rook squares, we consider the postmask intersection of two squares, called affected:

Code: Select all: affected = postMask[rooksq1] & postMask[rooksq2] with popCount(affected) == 2 (distinct file and rank).

In sharing unions of attack-sets, which are re-spitted by postmask afterwards, there is a good chance that

Code: Select all: (attacksSet[rooksq1] ^ attacksSet[rooksq2]) & affected == 0

This is not only true for the adjacent affected sets, where

Code: Select all: kingAttacks[rooksq1] & kingAttacks[rooksq2] & affected == affected

but also for a lot of other squares where the two affected bits are equal due to their individual occupancy.
This property may be considered in re-using or overlapping the sparse arrays even more aggressively. Using Tord's sparse RNG and some outer loops to control best overlapping or merging...

On the performance comparisons of occupancy lookup based sliding attack methods - inside perft frameworks or inside a concrete chess program - it depends a lot on memory and cache-footprint and may vary by architecture and compiler. Pure register based kogge-stone fill-stuff on the other hand, may better execute speculatively while prefetching a cacheline from the transposition table.

by **Pradu** » 11 May 2008, 17:30

Onno Garms wrote:- choose the a1 multiplier to map b1-g1 to bits 58-63 and a2-a7 to bits 52-57.
- choose the a2 multiplier to map b2-g2 bits 58-63 and map a2-a7 to bits 52-57. (a2 contains a zero here).
What I call side effects (e.g. a3->53 also maps a4->61) is the same for both multiplications, so that should not harm. So a2 can use the table of a1.

Disclaimer: This refers to my numbering a1=0, b1=1, ... I have not tried above improvement, so it might be faulty. But I think it will work.

This does not work (it is the same idea/mistake I posted in my edited post that I couldn't delete). The moves of A2 along the file are different than the moves of A1 along the file even if you could hash the occupancy bits identically. For example, lets assume an empty board with just A1 and A2 occupied with rooks. Moves for A2 will go all the way to A8, moves for A1 will stop at A2. In addition, the move-bitboards are different in more than the square of the moves being generated so you couldn't do something like &ing out a particular set of bits like &ing out all the bits except the square that the moves are being generated for.

by **Zach Wegner** » 11 May 2008, 20:06

I think you can, actually. The moves from A2 are the same as the moves from a rook on A1 when A2 is empty and A2 is taken out of the attack set (which is done by the postmask).

by **Pradu** » 11 May 2008, 20:56

Zach Wegner wrote:I think you can, actually. The moves from A2 are the same as the moves from a rook on A1 when A2 is empty and A2 is taken out of the attack set (which is done by the postmask).

Ok, you are right.

by **Matthias Gemuh** » 11 May 2008, 21:49

Pradu wrote:A good set of magic keys for bitboard move generation can be found here:

http://www.prism.gatech.edu/~gtg365v/Bu ... magics.txt

Sune Fischer and I are wiriting a program to produce a comparison in speed between rotated and magic move generation. I guess we shall post the source and results here to see if magic move generation outperforms rotated at modern hardware.

Hi Pradu,
I don't fully understand magic, especially the collision stuff.
Can you generate magics for me for a 10x8 board ?
A1=0...J8=79. No complicated table size reduction necessary.
Thanks,
Matthias.

by **Pradu** » 11 May 2008, 22:00

Matthias Gemuh wrote:
Pradu wrote:A good set of magic keys for bitboard move generation can be found here:

http://www.prism.gatech.edu/~gtg365v/Bu ... magics.txt

Sune Fischer and I are wiriting a program to produce a comparison in speed between rotated and magic move generation. I guess we shall post the source and results here to see if magic move generation outperforms rotated at modern hardware.

Hi Pradu,
I don't fully understand magic, especially the collision stuff.
Can you generate magics for me for a 10x8 board ?
A1=0...J8=79. No complicated table size reduction necessary.
Thanks,
Matthias.

The method used by most people is to try random numbers and it works well for generating the magics. So what you have to do is just try random numbers and one of them will end up working. I don't have a 10x8 bitboard backend yet and I'm not quite sure how it is setup for 10x8 so you could probably do it easier than I could.

by **Matthias Gemuh** » 11 May 2008, 22:42

Pradu wrote:The method used by most people is to try random numbers and it works well for generating the magics. So what you have to do is just try random numbers and one of them will end up working. I don't have a 10x8 bitboard backend yet and I'm not quite sure how it is setup for 10x8 so you could probably do it easier than I could.

The backend you mean is surely this class.
Its objects simply function like unsigned 80-bit integers :

class BITBOARD {
protected:
UINT32 data16;
UINT64 data64;

public:
//
BITBOARD(UINT32 dat16 = 0UL, UINT64 dat64 = 0ui64) {
data16 = (0x0000FFFFUL & dat16); data64 = dat64;
}
BITBOARD(const BITBOARD &BitA) { data16 = BitA.GetData16(); data64 = BitA.GetData64(); }
//
BITBOARD EmptyBitBoard() { return BITBOARD(); }
BITBOARD FullBitBoard() { return BITBOARD(0xFFFFFFFFUL, 0xFFFFFFFFFFFFFFFFui64); }
BITBOARD BitBitBoard( int nBit ) {
if (nBit <= 63) return BITBOARD(0UL, 1ui64 << nBit);
return BITBOARD((UINT32)(1ui64 << (nBit - 64)), 0ui64 );
}
//
inline UINT32 GetData16() const { return(0x0000FFFFUL & data16); }
inline UINT64 GetData64() const { return(data64); }
inline void PutData16(UINT32 dat16) { data16 = (0x0000FFFFUL & dat16); }
inline void PutData64(UINT64 dat64) { data64 = dat64; }
inline void Clear() { data16 = 0UL; data64 = 0ui64; }
//
BITBOARD operator & (const BITBOARD &BitA) const
{ return(BITBOARD((data16 & BitA.GetData16()), (data64 & BitA.GetData64()))); }
BITBOARD operator | (const BITBOARD &BitA) const
{ return(BITBOARD((data16 | BitA.GetData16()), (data64 | BitA.GetData64()))); }
BITBOARD operator ^ (const BITBOARD &BitA) const
{ return(BITBOARD((data16 ^ BitA.GetData16()), (data64 ^ BitA.GetData64()))); }
//
BITBOARD operator << (UINT32 k) const {
if (k == 0) return(*this);
if (k <= 15) return(BITBOARD(((data16 << k) | ((UINT32)(data64 >> (64-k)))), (data64 << k)));
else if (k <= 63) return(BITBOARD((UINT32)(data64 >> (64-k)), (data64 << k)));
else return(BITBOARD((UINT32)(data64 << (k-64)), 0ui64));
}
BITBOARD operator >> (UINT32 k) const {
if (k == 0) return(*this);
if (k <= 15) return(BITBOARD((UINT32)(data16 >> k), (((UINT64)data16 << (64-k)) | (data64 >> k))));
else if (k <= 63) return(BITBOARD(0UL, (((UINT64)data16 << (64-k)) | (data64 >> k))));
else return(BITBOARD(0UL, ((UINT64)data16 >> (k-64))));
}
//
bool operator && (const BITBOARD &B) const { return((data16 || data64) && (B.GetData16() || B.GetData64())); }
bool operator && (const bool A) const { return((data16 || data64) && A); }
bool operator || (const BITBOARD &B) const { return(data16 || B.GetData16() || data64 || B.GetData64()); }
bool operator || (const bool A) const { return(data16 ||data64 || A); }
bool operator != (const BITBOARD &B) const { return((data16 != B.GetData16()) || (data64 != B.GetData64())); }
bool operator != (const UINT64 A) const { return((data16 != 0ui64) || (data64 != A)); }
bool operator == (const BITBOARD &B) const { return((data16 == B.GetData16()) && (data64 == B.GetData64())); }
bool operator == (const UINT64 A) const { return((data16 == 0UL) && (data64 == A)); }
//
friend bool operator && (const bool A, const BITBOARD &B) { return(B && A); }
friend bool operator || (const bool A, const BITBOARD &B) { return(B || A); }
friend bool operator != (const UINT64 A, const BITBOARD &B) { return(B != A); }
friend bool operator == (const UINT64 A, const BITBOARD &B) { return(B == A); }
//
BITBOARD & operator = (const BITBOARD &BitA)
{ data16 = BitA.GetData16(); data64 = BitA.GetData64(); return(*this); }
BITBOARD & operator = (UINT64 A) { data16 = 0UL; data64 = A; return(*this); }
BITBOARD operator &= (const BITBOARD &BitA)
{ data16 &= BitA.GetData16(); data64 &= BitA.GetData64(); return(*this); }
BITBOARD operator |= (const BITBOARD &BitA)
{ data16 |= BitA.GetData16(); data64 |= BitA.GetData64(); return(*this); }
BITBOARD operator ^= (const BITBOARD &BitA)
{ data16 ^= BitA.GetData16(); data64 ^= BitA.GetData64(); return(*this); }
// BITBOARD & operator >>= (UINT32 k) { return(*this); }
// BITBOARD & operator <<= (UINT32 k) { return(*this); }
//
friend bool IsEmpty(const BITBOARD &BitA) { return((BitA.GetData16() | BitA.GetData64()) == 0ui64); }
friend bool IsNotEmpty(const BITBOARD &BitA) { return((BitA.GetData16()) || (BitA.GetData64())); }
bool operator ! () const { return((data64 | (UINT64)data16) == 0ui64); }
operator bool () const { return((data64 | (UINT64)data16) != 0ui64); }
BITBOARD operator ~ () { return(BITBOARD((~data16), (~data64))); }
//
bool BitIsSet(int nBit) const {
if (nBit <= 63) return (data64 & (1ui64 << nBit)) != 0ui64;
return (data16 & (1ui64 << (nBit - 64))) != 0UL;
}
void SetBit(int nBit) {
if (nBit <= 63) data64 = (data64 | (1ui64 << nBit));
else data16 = (data16 | (1ui64 << (nBit - 64)));
}
void ClearBit(int nBit) {
if (nBit <= 63) data64 = (data64 & (0xFFFFFFFFFFFFFFFFui64 ^ (1ui64 << nBit)));
else data16 = (data16 & (0xFFFF ^ (1ui64 << (nBit - 64))));
}
void ToggleBit(int nBit) {
if (nBit <= 63) data64 = (data64 ^ (1ui64 << nBit));
else data16 = (data16 ^ (1ui64 << (nBit - 64)));
}

};

If nobody can help with the magics generation, then I must understand enough of those collision algorithms to progam/modify them

Matthias.

by **Grant Osborne** » 12 May 2008, 11:23

Yes, I used the magics and shifts from Pradu's post 3 in this thread.

Hm, seems like I was a bit sloppy when making the rook indices (I did it manually). It could be a bit better by applying this pattern by repacing with necessary address space:
Code:
Idx[64]={
00,01,02,03,04,05,06,07,
01,00,03,02,05,04,07,06,
08,09,10,11,12,13,14,15,
09,08,11,10,13,12,15,14,
16,17,18,19,20,21,22,23,
17,16,19,18,21,20,23,22,
24,25,26,27,28,29,30,31,
25,24,27,26,29,28,31,30,
};

Regards, Lasse

As you know, 11-bit keys exist for two of the corners for rooks, there are also 10-bit keys for five of the edge squares. By substituting these better magics and using the pattern above brings the size of the table down to 448k.

Grant

Winboard Forum

List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Re: List of magics for bitboard move geneartion

Who is online