Fast(er) bitboard move generator

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Re: Fast(er) bitboard move generator

Postby bob » 17 Jun 2006, 04:12

I have always had such mobility lookups in Crafty, but I included the endpoints as well when I pre-computed the mobility values. I'm not sure why one would exclude them since capturing an opponent move is certainly a legal move and ought to count in terms of mobility..
User avatar
bob
 
Posts: 156
Joined: 10 May 2006, 17:59

Re: Fast(er) bitboard move generator

Postby Gerd Isenberg » 17 Jun 2006, 09:59

Lasse Hansen wrote:The RookMobility[64][2048] always return a value in the range [0, 14]=15 elements, that tells how many non-capturing moves it is allowed to do. It does not return an attack set. It is just a replacement for PopCount for all possible rook attack sets that are non-captures.
Lasse

I see, so for attack sets you are still 8 or 6MB? Nice trick for the popCount - i like to exclude opposite pawn attacks. But i guess trying to hash "safe" subsets, will blow up the table.
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 17 Jun 2006, 10:14

bob wrote:I have always had such mobility lookups in Crafty, but I included the endpoints as well when I pre-computed the mobility values.


I intend to make separate tables for captures. I believe the number of captures may say something about the tactical tension in a position.

I'm not sure why one would exclude them since capturing an opponent move is certainly a legal move and ought to count in terms of mobility..

As I see it, almost all captures in a leaf position in a normal search are losing moves, because Q-search should check those things out. Only pure tactics can be used to detect something better. So my first impression is that mobility should not include captures.

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby bob » 18 Jun 2006, 00:33

My "attack" arrays are 64x64 int64, my mobility arrays are 64x64 chars. I included the attacks after some testing, as leaving them out seemed a bit worse when your piece is attacking opponent pieces...
User avatar
bob
 
Posts: 156
Joined: 10 May 2006, 17:59

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 18 Jun 2006, 02:26

bob wrote:My "attack" arrays are 64x64 int64, my mobility arrays are 64x64 chars. I included the attacks after some testing, as leaving them out seemed a bit worse when your piece is attacking opponent pieces...

Testing is gold, talk is cheap :wink:

I looked a bit at the code in crafty, your mobility lookups seem very efficient. But you dont only count capturing of opponent pieces, you also count capturing your own, am I right?

It should be easy to copy your approach by using hashing of non-rotated bitboards more efficiently than what I have done, but I can already do exactly what you do. I just need to OR the attack bitmap for non-captures with the attack bitmap for all captures, then do the mobility lookup. But I do have the option of leaving out the attacks of both my own pieces and/or the opponents.

I guess I have to make a simplified Eval to see how costly my mobility calculations are. BTW, I made mobility lookup tables for king and knight too, though I dont know if these are needed.

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Pradu » 18 Jun 2006, 03:57

Lasse Hansen wrote:Pattern=RookMask[square]&(AllWPieces | AllBPieces);
Address= (Pattern*key[square])>>(64-13);
Attacks = AttacksRook[square][Address];


Hi Lasse (or anyone else who can answer my questions),

I really don't understand how Pattern and Key and some bsr by 51 gives the Address of the AttacksTo:

1) What does RookMask[square] contain? Is it a bitboard of rookmoves from that square? And if so, how is a bitboard of just rookmoves from a square and All Pieces enough information to produce attacks? For example both of the following positions will produce the same Pattern but the available rookmoves would be different.

[diag]k7/4b3/4b3/8/8/4R3/8/7K[/diag]
k7/4b3/4b3/8/8/4R3/8/7K w - - 0 1

[diag]k7/4b3/8/8/8/4R3/8/7K[/diag]
k7/4b3/8/8/8/4R3/8/7K w - - 0 1

2) Also, how did you calculate key[square] and figure out that the multiplication by key and a bitshift will hash the moves correctly?

3) When you profiled perft, approximately what was the ratio between the time spent making moves and the time spent generating them?
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: Fast(er) bitboard move generator

Postby bob » 18 Jun 2006, 04:05

Lasse Hansen wrote:
bob wrote:My "attack" arrays are 64x64 int64, my mobility arrays are 64x64 chars. I included the attacks after some testing, as leaving them out seemed a bit worse when your piece is attacking opponent pieces...

Testing is gold, talk is cheap :wink:

I looked a bit at the code in crafty, your mobility lookups seem very efficient. But you dont only count capturing of opponent pieces, you also count capturing your own, am I right?


Correct. I thought I had made that clear.


It should be easy to copy your approach by using hashing of non-rotated bitboards more efficiently than what I have done, but I can already do exactly what you do. I just need to OR the attack bitmap for non-captures with the attack bitmap for all captures, then do the mobility lookup. But I do have the option of leaving out the attacks of both my own pieces and/or the opponents.

I guess I have to make a simplified Eval to see how costly my mobility calculations are. BTW, I made mobility lookup tables for king and knight too, though I dont know if these are needed.

Lasse


I have been playing with mobility for the past 6 months, and we have (a group of us trying various mobility options) not found anything that seems to improve crafty's play one iota. It seems that the other things I am doing in the eval accomplishes the same thing...
User avatar
bob
 
Posts: 156
Joined: 10 May 2006, 17:59

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 18 Jun 2006, 04:33

Pradu wrote:
What does RookMask[square] contain? Is it a bitboard of rookmoves from that square?

RookMask[E3] is a bitboard that look like
Code: Select all
--------
----x---
----x---
----x---
----x---
-xxx-xx-
----x---
--------


The board edges arent relevant for square E3 (as often referred as the 64 trick).
For example both of the following positions will produce the same Pattern but the available rookmoves would be different.

No, they will not produce the same pattern as you can see now that you know the mask.
2) Also, how did you calculate key[square] and figure out that the multiplication by key and a bitshift will hash the moves correctly?

I didnt figure out that a 64 bit multiplication combined with a shift would do this, the idea came from another post in this forum, about deBruijn keys.
The keys arent calculated, but found by a brute-force approach. I use a random number generator to find key candidates. (The longest search for one key till now is 1.4billion tries :) )

3) When you profiled perft, approximately what was the ratio between the time spent making moves and the time spent generating them?


When profiling I dont use compiler optimisation, so I wouldnt trust the numbers.

Move generation 14.5%
Make/unmake 1.4%
Lazymake/unmake 34.5%
Incheck 29%
BitScanForward 7.8%

So, as you see, I check that every move is legal, and this uses ~65% of the time.
...So, I guess it can be optimised :/
The source code is available at http://home.c2i.net/lasha

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 18 Jun 2006, 10:52

I have made a simple example of what I do to find the keys. The following program makes a lookup table that returns the position of the set bit in SetBit[i]. Note that it will only work if just one bit is set, because thats what it is made for.

When TABLE_BITS=7 it uses 1803 iterations to find the 'magic' number. If TABLE_BITS=8 is uses 194 iterations, but then the table size is increased from 128 to 256 elements, and we are only interested in 64. If you want to try TABLE_BITS=6 you are trying to find the deBruijn constant that makes a minimal perfect hash :)

Hope this helps,
Lasse
Code: Select all
#include <stdio.h>
typedef unsigned long long U64;

#define TABLE_BITS 7   // means 128 elements
int FirstOne[1<<TABLE_BITS]; // this is the hash table I want to make
U64 MagicNumber;

int main()
{
   U64 Address, Pattern, N=0, SetBit[64], Random64();
   int i,success,n,best=0;

   SetBit[0]=1; // constructing the set of elements I want to hash
   for (i=1;i<64;i++) SetBit[i]=SetBit[i-1]<<1;
   
   do { // searching for a MagicNumber that fits my needs
      for (i=0;i<(1<<TABLE_BITS);i++) FirstOne[i]=-1;
      MagicNumber=Random64();
      success=1;
      n=0;
      for (i=0;i<64;i++) { // trying all elements
         Pattern = SetBit[i];
         Address = (Pattern*MagicNumber)>>(64-TABLE_BITS);
         if (FirstOne[Address] != -1) {
            success=0;
            if (n>best) {
               best=n;
               printf("best=%d\n",best);fflush(stdout);
            }
            break;
         }
         else {
            FirstOne[Address] = i;
            n++;
         }
      }
      N++;
      if (!(N&((1<<24)-1))) {
         printf("."); fflush(stdout);
      }
   } while (!success);
   printf("MagicNumber found in %u iterations\n",(unsigned int)N);
   for (i=0;i<64;i++) { // testing the result
      Pattern = SetBit[i];
      Address = (Pattern*MagicNumber)>>(64-TABLE_BITS);
      printf("i=%d FirstOne[%d]=%d\n",i,(int)Address,FirstOne[Address]);
   }
   return 0;
}
// The following is taken from crafty
/*
 A 32 bit random number generator. An implementation in C of the algorithm
 given by Knuth, the art of computer programming, vol. 2, pp. 26-27. We use
 e=32, so we have to evaluate y(n) = y(n - 24) + y(n - 55) mod 2^32, which
 is implicitly done by unsigned arithmetic.
 */
unsigned int Random32(void)
{
/*
 random numbers from Mathematica 2.0.
 SeedRandom = 1;
 Table[Random[Integer, {0, 2^32 - 1}]
 */
  static const unsigned long x[55] = {
    1410651636UL, 3012776752UL, 3497475623UL, 2892145026UL, 1571949714UL,
    3253082284UL, 3489895018UL, 387949491UL, 2597396737UL, 1981903553UL,
    3160251843UL, 129444464UL, 1851443344UL, 4156445905UL, 224604922UL,
    1455067070UL, 3953493484UL, 1460937157UL, 2528362617UL, 317430674UL,
    3229354360UL, 117491133UL, 832845075UL, 1961600170UL, 1321557429UL,
    747750121UL, 545747446UL, 810476036UL, 503334515UL, 4088144633UL,
    2824216555UL, 3738252341UL, 3493754131UL, 3672533954UL, 29494241UL,
    1180928407UL, 4213624418UL, 33062851UL, 3221315737UL, 1145213552UL,
    2957984897UL, 4078668503UL, 2262661702UL, 65478801UL, 2527208841UL,
    1960622036UL, 315685891UL, 1196037864UL, 804614524UL, 1421733266UL,
    2017105031UL, 3882325900UL, 810735053UL, 384606609UL, 2393861397UL
  };
  static int init = 1;
  static unsigned long y[55];
  static int j, k;
  unsigned long ul;

  if (init) {
    int i;

    init = 0;
    for (i = 0; i < 55; i++)
      y[i] = x[i];
    j = 24 - 1;
    k = 55 - 1;
  }

  ul = (y[k] += y[j]);
  if (--j < 0)
    j = 55 - 1;
  if (--k < 0)
    k = 55 - 1;
  return ((unsigned int) ul);
}

U64 Random64(void)
{
  U64 result;
  unsigned int r1, r2;

  r1 = Random32();
  r2 = Random32();
  result = r1 | (U64) r2 << 32;
  return (result);
}
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Gerd Isenberg » 18 Jun 2006, 13:30

If you get the two six-bit occupied indices of the rank/file or diagonal/antidiagonal by masking the rays independently, there is no need to look for random or DeBruijn factors[sq], as i tried and Tony did with random numbers, before i "discoverd" the fill and rotate multiplication with the help of Steffan Westcott.
So if we compare Lasse's approach ....
Code: Select all
TBitBoard BishopAttacks[64][1<<13]; // 4MByte
TBitBoard MaskBishop[64];
const U64 HashKeyBishop[64]= {...};

Pattern = (AllWPieces | AllBPieces) & MaskBishop[From];
Address = (Pattern*HashKeyBishop[From])>>(64-13);
A = BishopAttacks[From][Address];
... with an approach to get two six-bit occupied indices per distinct ray, ...
Code: Select all
TBitBoard BishopAttacks[64][1<<6][1<<6]; // 2MByte
TBitBoard MaskDiagonal[64];
TBitBoard MaskAntiDiagonal[64];

Pattern1 = (AllWPieces | AllBPieces) & MaskDiagonal[From];
Pattern2 = (AllWPieces | AllBPieces) & MaskAntiDiagonal[From];
Address1 = (Pattern1*0x0202020202020202)>>(64-6);
Address2 = (Pattern2*0x0202020202020202)>>(64-6);
A = BishopAttacks[From][Address1][Address2];
we see how big the savings in Lasse's approach are (despite table size where Lasse is already able to use 12-bit addresses for bishops at least). Only one and/mul/shift->lookup. Otoh doing pairs of instructions has also it merits. The classical rotated bitboard approach provides Address1 and Address2 on the fly of course without multiplication but the probably negligible cost of keeping and updating the rotated bitboards - so even Bob may use a huge table approach - to immediatly "save" one lookup and combining two by "or".
Code: Select all
A = BishopAttacks[From][Address1][Address2];
instead of
Code: Select all
A = Dia1Attacks[From][Address1] | Dia2Attacks[From][Address2];
The question is how much do the huge tables hurt us, individially in our chess program - compared to the very condensed 1.5KByte approach - or the classical appoaches with 128KByte or less?
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 18 Jun 2006, 15:59

I made another perft that counts quasi-legal moves. With the position "r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w QKqk -\n"
the node count depth 5 is 200120067 (accumulated nodes) with a speed of 58.174 MNps. When I profiled this I got the following values:
Code: Select all
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 34.71    101.25   101.25 197876243     0.00     0.00  MovgenNoncapts
 29.27    186.64    85.39 200180423     0.00     0.00  MovgenCapts
 15.29    231.26    44.61 12426301514     0.00     0.00  bitScanForward
  8.64    256.46    25.21 628189928     0.00     0.00  InCheck
  3.50    266.67    10.20                             xtos64
  3.43    276.67    10.00 200180422     0.00     0.00  MakeMove
  2.18    283.03     6.36        1     6.36   278.35  Perft2
  1.89    288.55     5.53 200180422     0.00     0.00  UnmakeMove

At least now the move generation part uses most of the processor time.
Edit:
And I dont know why the heck xtos64 is in this list, It is not called one single time.
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Grant Osborne » 20 Jun 2006, 10:52

Lasse

Have you managed to find any 4096 keys for the rook?

Regards
Grant Osborne
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 20 Jun 2006, 13:05

Grant Osborne wrote:
Have you managed to find any 4096 keys for the rook?

Yes, for square H8 I found 0x608001d3c001e085 after 2.7 hours and 1+ billion tries :) . Then I tried H7 and stopped after 18 hours.

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Gerd Isenberg » 20 Jun 2006, 17:29

I know that feeling ;-)

Something like this would work of course, getting file occupied state in the six msb by shifting and masking to a-file and mul with h2-c7-ray (with a1=0 mapping).
Code: Select all
AFile     =  0x0101010101010101 & (AllPieces >>  (From &  7)   );
occ64Rank =  0x000000000000003f & (AllPieces >> ((From & ~7)+1));
occ64File = (0x0004081020408000 *  AFile)    >>  (64-6);
Address   = (occ64File << 6)    +  occ64Rank; // 4096 range

I guess it's faster for you to stay with 8K keys - or to think about your random numbers. May be it is important to have only about 12 bits set, or a few more.
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Fast(er) bitboard move generator

Postby Jean-Francois Romang » 20 Jun 2006, 18:32

Lasse Hansen wrote:Grant Osborne wrote:
Have you managed to find any 4096 keys for the rook?

Yes, for square H8 I found 0x608001d3c001e085 after 2.7 hours and 1+ billion tries :) . Then I tried H7 and stopped after 18 hours.

Lasse


If you want processor time, just send me the code needed to find the keys ; i will be happy to burn my CPU :D
Jean-Francois Romang
 
Posts: 6
Joined: 11 Jul 2005, 21:01
Location: France/Paris

Re: Fast(er) bitboard move generator

Postby bob » 20 Jun 2006, 19:24

Or if you have a way to distribute the work across several machines, I can crank up our 128 node dual 64bit xeon cluster...
User avatar
bob
 
Posts: 156
Joined: 10 May 2006, 17:59

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 20 Jun 2006, 19:57

bob wrote:Or if you have a way to distribute the work across several machines, I can crank up our 128 node dual 64bit xeon cluster...


:D Thanks, for that, but I'm working on another approach. I make a 'colored' 64 bit random number that look more like 0x202000a4681000 (less bits in the word) and I have already found all RookAttacks 4096 keys. Maybe it can be reduced even more, I'll let you hear.

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Gerd Isenberg » 20 Jun 2006, 21:41

Congrats! What squares were the most difficult one to find? Edges or border - near 0,7,56 or 63?

I didn't exactly understand how a collision is exactly defined in your brute-force approach. For the rook on e3 with all neighbored squares occupied, there are six don't care bits (additionally to the redundant outer squares), where you can greatly accept collisions with all 2**6 possible combination of the *squares here. So if you have a collision with the same attack pattern - or the same unique number of all enumerated attack pattern per square, you can ignore those "productive" accidently shared collisions - if you don't allready do that.
Code: Select all
--------
----*---
----*---
----*---
----1---
-**1-1*-
----1---
--------
Depending on your mask and your N upper bits of the product, you can determine the least significat one bit in the mask that needs shifted to the upper N bits and determines a lower bound of the factor. The least significant one bit of the factor takes care not to completely shift out the most significant one bit. So you have some plausible ranges of the most and least significant one-bit inside the factor. It think the lower bound of magic factor's popcount should be four as absolute minimum or probably better the number of groups with consecutive bits in the mask. Even a recombimation of fixed source-target square relationship is possible, with some target(i) >= source(j), to calculate a bitindex of a bit inside the factor by subtracting upper target minus source square inside mask.

I think for the edges there are some more constrains if you like to go below 12 bit addresses. There you have exactly 12-relevant occupied bits you need to consider.

Good luck,
Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Fast(er) bitboard move generator

Postby Pradu » 20 Jun 2006, 21:58

Thanks for posting some source Lasse! From what I understand, for generating keys that there is a 1 to 1 key to index relation ship the way you do it. Does anyone think it is possible to create a multiple keys to index relationship using the same multiply shift hashing scheme?

We know there are 4900 possible rook move bitboards and 1428 possible bishop move bitboards:
Code: Select all
Possible rook move bitboards per square
 49  42  70  84  84  70  42  49
 42  36  60  72  72  61  36  42
 70  60 100 120 120 100  60  70
 84  72 120 144 144 120  72  84
 84  72 120 144 144 120  72  84
 70  60 100 120 120 100  60  70
 42  36  60  72  72  61  36  42
 49  42  70  84  84  70  42  49

Possible bishop move bitboards per square
  7   6  10  12  12  10   6   7
  6   6  10  12  12  10   6   6
 10  10  40  48  48  40  10  10
 12  12  48 108 108  48  12  12
 12  12  48 108 108  48  12  12
 10  10  40  48  48  40  10  10
  6   6  10  12  12  10   6   6
  7   6  10  12  12  10   6   7


So a minimal move database would have rookMoves[4900] bishopMoves[1428]. This would take less than 25k which is smaller than compressed 2x[64][64] rotated bitboard databases for each piece (2 x 32k total) and they might be faster.

So I was wondering if it was possible to create a cheap hashing function RmovesIncrement in such a way that you can perfectly hash all keys into arrays rookMoves or bishopMoves:

Code: Select all
#define Rmoves(pos,square) *(RmovesPointer[square]+RmovesIncrement(pos,square))
#define Bmoves(pos,square) *(BmovesPointer[square]+BmovesIncrement(pos,square))


Of course RmovesIncrement would have to be a multiple keys to single index type hashing function.

Mabe it isn't possible to find the keys needed in RmovesIncrement in reasonable time using a brute force method considering the trouble you had generating keys for a 1 to 1 minimal perfect hash (the 4906 one)...does anyone know of more efficient ways to find these keys or to find out if such keys even exist?

Edit:
Gerd Isenberg wrote:I didn't exactly understand how a collision is exactly defined in your brute-force approach. For the rook on e3 with all neighbored squares occupied, there are six don't care bits (additionally to the redundant outer squares), where you can greatly accept collisions with all 2**6 possible combination of the *squares here. So if you have a collision with the same attack pattern - or the same unique number of all enumerated attack pattern per square, you can ignore those "productive" accidently shared collisions - if you don't allready do that.


Gerd posted while I was posting :mrgreen:.

Yeah I had the same idea, except wondering if you can have all the relavent keys to collide to a particular index.
Last edited by Pradu on 20 Jun 2006, 23:18, edited 2 times in total.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: Fast(er) bitboard move generator

Postby Gerd Isenberg » 20 Jun 2006, 22:37

Pradu wrote:So I was wondering if it was possible to create a cheap hashing function RmovesIncrement in such a way that you can perfectly hash all keys into arrays rookMoves or bishopMoves:
...
Mabe it isn't possible to find the keys needed in RmovesIncrement in reasonable time using a brute force method considering the trouble you had generating keys for a 1 to 1 minimal perfect hash (the 4906 one)...does anyone know of more efficient ways to find these keys or to find out if such keys even exist?

That hashing is possible with an additional indirection. Thus reducing the [64][4096] tables from bitboard to 16-bit short. Thus from 2MB each to 512KB. 16-bit index for all enumerated attack sets. But getting 64- or 128-byte cachelines from 1MB or 4MB to get two or eight byte doesn't matter much as long you rarely profit from sharing cachelines for different attack pattern.
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

PreviousNext

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 9 guests