New bitboard move generator

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Re: New bitboard move generator

Postby Harald Lüßen » 21 Sep 2006, 00:21

Hi Mike

I hope you feel better soon.

Using a union is ok. But be careful with different compilers and CPUs.
The byte order or inner struct/union alignment may vary. Though
normally with 64 bit and 8 bytes everything should be working as expected.

Do 64-bit compilers and C/C++ have 64-bit bitfields? Would that be ok also?
You need not answer this. It is not important now.

I just wondered if you planned to use my shifting 6-bit style for the bishops.
Rooks do need all 8 rows and 8 files. And may be your byte access is faster
than shifting.

Am I right that the required bit number per square for rooks looks like this?
Code: Select all
12 11 11 11 11 11 11 12    4 x 12 bit =  4 x 4096 entries = 16384
11 10 10 10 10 10 10 11   24 x 11 bit = 24 x 2048 entries = 49152
11 10 10 10 10 10 10 11   36 x 10 bit = 36 x 1024 entries = 36864
11 10 10 10 10 10 10 11                            total = 102400
11 10 10 10 10 10 10 11
11 10 10 10 10 10 10 11
11 10 10 10 10 10 10 11
12 11 11 11 11 11 11 12


The + and | question: Ok, it is possible to use |. I just made a table
for the bishops. But you can not use all sequences for the subtables.
You have to plan the order carefully. That takes away one possibility
of choice and freedom and complicates the initialisation of the table.

Proof: The number of entries in the table is 5248 which is 1010010000000 in binary.
You can not use the 9-bit central bishop index subtable at the end of the table,
because that would require the bits xxxx000000000 to xxxx111111111.
That is a contradiction to the last possible index which is 1010001111111. qed.

However a possible sequence for the | idea is:
Code: Select all
0000999999999 4 times 9 bit
0001999999999
0010999999999
0011999999999
0100007777777 12 times 7 bit
01....7777777
0110117777777
0111000666666 4 times 6 bit
0111001666666
0111010666666
0111011666666
0111100055555 44 times 5 bit
01111...55555
0111111155555
1000000055555
100.....55555
1001111155555
1010000055555
101000..55555
1010001155555

Now apply the 64 squares to this sequence. :-(
That's why I like to use + with the offset-idea in mind and any sequence is ok.

Another news is that I played with bit shifting and chasing the bits
around the board by multiplying them with magic numbers. And doing this
I found the method that Gerd Isenberg uses more or less independently
from his postings. That is, I knew there was this method but I did not
look too close and I did no reverse engineering. But at least I understand
it now. Well, most of it.

Tester needed! Volunteers step forward! Do this test:
For each of the methods
- rotated bitboards from Bob,
- magic bitboards from Gerd,
- exploding bitboards from Harald,
- hashed bitboards from Mike
implement it and measure the time for
- full attack bitboard generating,
- 8 direction attacks,
- mobility function,
- do and undo all moves
of the rooks and bishops
in all positions of a test set with openings, middle game and endgame,
and show us the results.

Harald
User avatar
Harald Lüßen
 
Posts: 29
Joined: 09 Oct 2004, 22:39

Re: New bitboard move generator

Postby Michael Sherwin » 21 Sep 2006, 00:29

Hi Tony,

Thanks for the suggestion. There are a lot of different ways to do this approach. First I am going to try the imperfect hash method with a signature. This has the smallest data and instruction requirements of all. Your suggestion will make the data even smaller. It will be a big success if there is a hit rate of 90% or more!

Mike
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: New bitboard move generator

Postby Michael Sherwin » 21 Sep 2006, 00:43

Hi Gerd,

Thanks. I did not think of the fact that a 64 bit scaler would be kept in registers. In that case would it not be better for 32bit machines to split the scaler into 32 bit halves and work on them seperately? Also it sounds that you are saying that Pradu's method is similar in function to mine, only coded differently. I will look to see if that is the case. I have upgraded my cold to the flu. It seems that it is going to get worse, before it gets better.

Thanks,
mike
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: New bitboard move generator

Postby Michael Sherwin » 21 Sep 2006, 01:33

Hi Harald,

I will eventually test both shifting and byte wise accesses to find the fastest way. The storage location 5248 is one beyond the end of the bishop table. The base for h8 should be 5184 which is 1010001000000 leaving room for the 6 bits of h8. The question then becomes, do 7 and 9 bit subarray values interfer with this last storage locations bits. I do not see how, as every subtable entry has enough room for the number of bits for the square in question. The collision that you are pointing out should simply not happen. I will take paper and pencil and figure out all 64 squares by hand and get back to you. This is confusing stuff and hard to keep in ones brain. The bit counts for the rooks is correct. Gerd's method is a fantastic replacement for rotated bitboards, especially for 64bit machines. I am trying to do as well or better for 32bit machines. There is need for lots of experimenters. I am going to test the Zorbrist hash method first as it will be simple to implement and will require the least amount of data.

Mike
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: New bitboard move generator

Postby Michael Sherwin » 21 Sep 2006, 02:36

I think your approach has some nice didactical features to learn how Pradu's approach works.


I have downloaded Pradu's source and have taken a quick look at it. And I do see similarities. It seems that he shifts through the bishop and rook rows like I do. I assume that his system for a perfect non minimal hash is similar to what I had intended. His code is way more complicated than mine. It would be best if he could look at my code and decribe the similarities and differences. As far as I can tell, his is a true perfect hash and does not access a minimal size table. My last developed variant is an exact lookup table and not a hash table at all. Maybe his method can benifit from what I have done or my method can benifit from what Pradu has done. Personally, I would like to have Pradu's viewpoint.

Mike
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: New bitboard move generator

Postby Pradu » 21 Sep 2006, 04:10

Michael Sherwin wrote:
Gerd wrote:I think your approach has some nice didactical features to learn how Pradu's approach works.


I have downloaded Pradu's source and have taken a quick look at it. And I do see similarities...Personally, I would like to have Pradu's viewpoint.

Mike


Gerd's viewpoint is far more valuable than mine. He knows a lot more about optimization than me, but I'll try to give one.

I haven't been following this topic so lets first lets make sure my interpretation is hopefully correct.

Code: Select all
moveBits = bishopBits[sq];                    //I guess these are bishop moves when the occupancy is 0
blockers.u = moveBits & allPieces & noEdge;   //We use the same trick here - don't care about bits in the edge
                                              //of the occupancy
moveBits &= possibleBits2[sq][blockers.row2]; //For these 6 lines, we take out bits from the moveBits
moveBits &= possibleBits3[sq][blockers.row3]; //depending on the occupancy of each row
moveBits &= possibleBits4[sq][blockers.row4];
moveBits &= possibleBits5[sq][blockers.row5];
moveBits &= possibleBits6[sq][blockers.row6];
moveBits &= possibleBits7[sq][blockers.row7];
moveBits ^= friendlies;


Well one similarity between our move bitboard generators is that it is dependant on only two variables, 1 square and 1 occupancy. In the actual generation of the move bitboard, I don't think there are very many similarities (if my interpretation is correct). To find the move bitboard I do it this way:

Code: Select all
#define Bmagic(square, occupancy) B_database[square][(((occupancy)&Bishop_Mask[square])*Bishop_Magic[square])>>MINIMAL_B_BITS_SHIFT]

Comments
--------
Relevant_Occupancy - ((occupancy)&Bishop_Mask[square])
This makes an occupancy bitboard with only relevant blockers.

Hash_Index         - (Relevant_Occupancy*Bishop_Magic[square])>>MINIMAL_B_BITS_SHIFT
This is the main idea. Shift the blocker bits up using some magic to make a pattern, then shift the pattern back down to make an index

Move_Bitboard      - B_database[square][Hash_Index]
Access a database to get the bishop moves

From my tests, this method becomes atleast 2x faster in 64 bits and faster than rotated as well.  It can aslo be used to find xray-attacks and pins in all directions simultaneously.

Not too complicated right? :mrgreen:

There are some variations to this method but the only difference is how the database is organized and accessed and how the occupancy bitboard is made. Some guys also split it up into 2 multiplys to make it 32 bit friendly and they sometimes do it the rotated way too (do rank/file seperately).

Can you make an implementation of your method and post it online so we can try out it's speed vs other move bitboard generators?

Off-topic: I think there should be a website with a collection of positionally independant move bitboard generators to try out and make your engine work best on any platform. (Gerd's 1.5k for memory starved platforms, Lasse's/mine for modern 64 bit computers, mabe yours for 32 bit computers...who knows). You could probably come up with a function/marco naming convention to make even easier to switch between move bitboard generators.
Last edited by Pradu on 21 Sep 2006, 04:51, edited 9 times in total.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: New bitboard move generator

Postby Michael Sherwin » 21 Sep 2006, 04:33

The + and | question: Ok, it is possible to use |. I just made a table
for the bishops. But you can not use all sequences for the subtables.
You have to plan the order carefully. That takes away one possibility
of choice and freedom and complicates the initialisation of the table.

Proof: The number of entries in the table is 5248 which is 1010010000000 in binary.
You can not use the 9-bit central bishop index subtable at the end of the table,
because that would require the bits xxxx000000000 to xxxx111111111.
That is a contradiction to the last possible index which is 1010001111111. qed.


Hi Harald,

You are right, 9 bits break the '|' method:

Code: Select all
square num_bits subtable bit_pattern
a1     6        0        000000??????
b1     5        64       0000010?????
c1     5        96       0000011?????
d1     5        128      0000100?????
e1     5        160      0000101?????
f1     5        192      0000110?????
g1     5        224      0000111?????
h1     6        256      000100??????
a2     5        320      0001010?????
b2     5        352      0001011?????
c2     5        384      0001100?????
d2     5        416      0001101?????
e2     5        448      0001110?????
f2     5        480      0001111?????
g2     5        512      0010000?????
h2     5        544      0010001?????
a3     5        576      0010010?????
b3     5        608      0010011?????
c3     7        640      00101???????
d3     7        768      00110???????
e3     7        896      00111???????
f3     7       1024      01000???????
g3     5       1152      0100100?????
h3     5       1184      0100101?????
a4     5       1216      0100110?????
b4     5       1248      0100111?????
c4     7       1280      01010???????
d4     9       1408      01011


'+' will have to be used instead. Thanks! :D That bug would have caused me trouble.

I will go back and edit the post.

Mike
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: New bitboard move generator

Postby Michael Sherwin » 21 Sep 2006, 04:52

Hi Pradu;

That was a fantastic explanation! Now I understand perfectly what you have done. It will take me (and my partners) some time to get my idea up and running. I do not know if my code can be faster than yours on 32bit or not. I guess it depends on how expensive spliting all the 64bit numbers is and especially the expence of that 64bit multiply on 32bit machines.

Thanks,
Mike

EDIT: I just realized that you quoted a secondary approach meant for 64bit machines. My 32bit code is quite different and accesses a 5248 element array for bishops and a 102400 element array for rooks. The biggest difference is how we create the index.
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: New bitboard move generator

Postby Zach Wegner » 21 Sep 2006, 06:22

Harald Lüßen wrote:Am I right that the required bit number per square for rooks looks like this?
Code: Select all
12 11 11 11 11 11 11 12    4 x 12 bit =  4 x 4096 entries = 16384
11 10 10 10 10 10 10 11   24 x 11 bit = 24 x 2048 entries = 49152
11 10 10 10 10 10 10 11   36 x 10 bit = 36 x 1024 entries = 36864
11 10 10 10 10 10 10 11                            total = 102400
11 10 10 10 10 10 10 11
11 10 10 10 10 10 10 11
11 10 10 10 10 10 10 11
12 11 11 11 11 11 11 12

This can be compressed further, as exemplified in my initialization code (which doesn't work ;)). Because for the rank the rook is on, you know the full attack pattern, you can reduce the needed bits for that rank to 3 or 4. The table then looks like this:
Code: Select all
 9  9 10 10 10 10  9  9    8 x 10 bit =  8 x 1024 entries = 8192
 8  8  9  9  9  9  8  8   32 x  9 bit = 32 x  512 entries = 16384
 8  8  9  9  9  9  8  8   24 x  8 bit = 24 x  256 entries = 6144
 8  8  9  9  9  9  8  8                             total = 30720
 8  8  9  9  9  9  8  8
 8  8  9  9  9  9  8  8
 8  8  9  9  9  9  8  8
 9  9 10 10 10 10  9  9

So a 30720/102400=.3=70% reduction for free.
However a possible sequence for the | idea is:
Code: Select all
0000999999999 4 times 9 bit
0001999999999
0010999999999
0011999999999
0100007777777 12 times 7 bit
01....7777777
0110117777777
0111000666666 4 times 6 bit
0111001666666
0111010666666
0111011666666
0111100055555 44 times 5 bit
01111...55555
0111111155555
1000000055555
100.....55555
1001111155555
1010000055555
101000..55555
1010001155555


Yes, a kind of Huffman coding. I was thinking about this, but only by classes of squares, which saves a bit-14 bits total. But by doing it square-wise, you seem to be able to squeeze another bit out of it. Nicely done. The other nice thing about this method is that you don't need any other arrays, as the index lookup can have the Huffman code already |ed into it. So it is one less lookup/instruction than the + method.

Also, when applying this method to rooks, since only one class of squares is a non-power of 2, you don't need any strange sequences. It is simply
Code: Select all
Code Number Range
0    16384  00000 00000 00000 -
            01111 11111 11111
10   8192   10000 00000 00000 -
            10111 11111 11111
11   6144   11000 00000 00000 -
            11110 00000 00000


Zach
User avatar
Zach Wegner
 
Posts: 182
Joined: 26 Sep 2004, 22:02
Location: Austin, Texas, USA

Re: New bitboard move generator

Postby Tony van Roon-Werten » 21 Sep 2006, 08:15

Gerd Isenberg wrote:
Michael Sherwin wrote:I am very sick with a fever at the moment, however, I will try to answer your questions.

You stay to the big rookRows[64][8][256] index. Why?


The rooks can move along the edges. Therfore, all 8 rows are needed and all 8 bits in each row are needed.


Can you give us some real code for blockers being a 64-bit integer
and a struct with .row1 to .dow8 members.?


Sorry, my C primer book is vauge about unions. Maybe this is how to do it.

Code: Select all
typedef struct {
  u08 row1;
  u08 row2;
  u08 row3;
  u08 row4;
  u08 row5;
  u08 row6;
  u08 row7;
  u08 row8;
} row;

typedef union {
  u64 u;
  row rows;
} blocks;

index = rookSubTbls[sq]
  | (rRows +     0)[blockers.rows.row1]
  | (rRows +   256)[blockers.rows.row2]
  | (rRows +   512)[blockers.rows.row3]
  ...
  | (rRows +  1792)[blockers.rows.row8];


You use the |-operator here:
index = rookSubTbls[sq] | (rRows + 0)[blockers.row1] ...
I can not see how this could work. the rRows-bits
form a 9bit/7bit/6bit/5bit value with the range 0..511/127/63/31,
at least for the bishops. How can you find an index for rookMoveTbl[index] using | ?
With + this is very easy to understand. Just put one sub-array
after the other ang give the offsets in rookSubTbls[sq].

Code: Select all
bishopSubTbls[0] =  0;    // 0000000(7) | 111111(6) max indici 63
bishopSubTbls[1] = 64;    // 1000000(7) |  11111(5) max indici 95
bishopSubTbls[2] = 96;    // 1100000(7) |  11111(5) max indici 127

rookSubTbls[0] =    0;    // 0000000000000(13) | 111111111111(12) max indici 4095
rookSubTbls[1] = 4096;    // 1000000000000(13) |  11111111111(11) max indici 6143
rookSubTbls[2] = 6144;    // 1100000000000(13) |  11111111111(11) max indici 8191


If I am wrong about any of the above, then I am glad to recieve correction. As far as using '+' instead of '|' that is perfectly okay with me.

Mike


A anonymious struct inside the union would be fine. Or simply a union with u64 and u8[8]. But i wouldn't do that with a union but stay with a u64 scalar like Harald suggested, to keep things in registers and to perform consecutive shifts,and. You allready have nine reads anyway - at least nine cacheliens. Using six or eight additional bytewise reads seems more 32-bit friendly on the first glance, but there is no more chance to read two "blocks" simultaniously. I think your approach has some nice didactical features to learn how Pradu's approach works. Simply count instructions and memory reads and keep L1 footprint in mind...

Become good well soon!

Cheers,
Gerd


It does make one think to revers the indexes squares and blockers. Somehow it feels to me that the blockers are more constant than the squares. Reversing might improve the cache hits.

Tony
Tony van Roon-Werten
 
Posts: 99
Joined: 02 Oct 2004, 15:31
Location: 's Hertogenbosch, Netherlands

Re: New bitboard move generator

Postby Harald Lüßen » 21 Sep 2006, 08:31

Pradu wrote:Gerd's viewpoint is far more valuable than mine. He knows a lot more about optimization than me, but I'll try to give one.
[...]
To find the move bitboard I do it this way:

Code: Select all
#define Bmagic(square, occupancy) B_database[square][(((occupancy)&Bishop_Mask[square])*Bishop_Magic[square])>>MINIMAL_B_BITS_SHIFT]

Comments
--------
Relevant_Occupancy - ((occupancy)&Bishop_Mask[square])
This makes an occupancy bitboard with only relevant blockers.

Hash_Index         - (Relevant_Occupancy*Bishop_Magic[square])>>MINIMAL_B_BITS_SHIFT
This is the main idea. Shift the blocker bits up using some magic to make a pattern, then shift the pattern back down to make an index

Move_Bitboard      - B_database[square][Hash_Index]
Access a database to get the bishop moves

From my tests, this method becomes atleast 2x faster in 64 bits and faster than rotated as well. 
It can aslo be used to find xray-attacks and pins in all directions simultaneously.



I tried this idea with magic multiplications per square to move all blockers/ocupancy bits
to a nice place, say the 9 bits from 63 to 47, that is the 8th row and one bit from the 7th.
This value could be shifted down to bits 0 to 8 und used as an index to a move bitboard
BmoveBitboard[sq][512]. But the problem was that the magic multiplication works on all bits
and when the right ones land on the place I wish them to go there are other bits interfering.
This problem vanished when I divided the occupied/blocker bits in two parts. Two diagonals
for the bishop or horicontal/vertical for rooks. This is done with 2 move masks instead of one.
the bits can then be shifted in parallel up or diagonal with 0101010101010100 or 0202020202020202 (*)
as factors. The vertical rook bits must be shifted to the border of the board first.
That is the essence of Gerd's method. (*) Perhaps some slightly different numbers?

Not too complicated right? :mrgreen:

There are some variations to this method but the only difference is how the database is organized
and accessed and how the occupancy bitboard is made. Some guys also split it up into 2 multiplys
to make it 32 bit friendly and they sometimes do it the rotated way too (do rank/file seperately).

Can you make an implementation of your method and post it online so we can try out it's speed vs
other move bitboard generators?

Off-topic: I think there should be a website with a collection of positionally independant move
bitboard generators to try out and make your engine work best on any platform.
(Gerd's 1.5k for memory starved platforms, Lasse's/mine for modern 64 bit computers,
mabe yours for 32 bit computers...who knows).
You could probably come up with a function/marco naming convention to make even easier to switch
between move bitboard generators.


That's what I thought about when I called for testers and volunteers. Half meant as a joke and
half in earnest. Perhaps we should start with a collection of methods, their names, short descriptions
and their inventors. I don't think that the four methods I numbered are correct here.

Harald
User avatar
Harald Lüßen
 
Posts: 29
Joined: 09 Oct 2004, 22:39

Re: New bitboard move generator

Postby Michael Sherwin » 21 Sep 2006, 10:12

This is getting very interesting. I see that bringing this idea forward was the right thing to do. I may have never come up with some of this stuff on my own! :D

Mike
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: New bitboard move generator

Postby Gerd Isenberg » 21 Sep 2006, 22:13

Michael Sherwin wrote:
I think your approach has some nice didactical features to learn how Pradu's approach works.


I have downloaded Pradu's source and have taken a quick look at it. And I do see similarities. It seems that he shifts through the bishop and rook rows like I do. I assume that his system for a perfect non minimal hash is similar to what I had intended. His code is way more complicated than mine. It would be best if he could look at my code and decribe the similarities and differences. As far as I can tell, his is a true perfect hash and does not access a minimal size table. My last developed variant is an exact lookup table and not a hash table at all. Maybe his method can benifit from what I have done or my method can benifit from what Pradu has done. Personally, I would like to have Pradu's viewpoint.

Mike


Your approach is interesting and worth to think about - also Zach's compression idea - good brain jogging. But i still fear it is much to slow - even in 32-bit mode.

Pradu's or initially Lasse's approach use one mul/shift for almost the same as you by reading/adding eight values (eight cachelines). Further, to index your rRows-tables by masked ranks, you need either eight byte-wise memory reads with zero extension (ok, only one additional cacheline, anyway...) or seven shifts and seven ands by 0xff.
Code: Select all
index = (blockers * rookMagig[sq]) >> rookShift[sq]; // 9..12 bit address

Code: Select all
index = rookSubTbls[sq] +
       (rRows[   0][blockers.row1]
      + rRows[ 256][blockers.row2]
      + rRows[ 512][blockers.row3]
      + rRows[ 768][blockers.row4]
      + rRows[1024][blockers.row5]
      + rRows[1280][blockers.row6]
      + rRows[1536][blockers.row7]
      + rRows[1792][blockers.row8]);


This is somehow the code of Pradu's rook-attacks with ~800K lookups. Since i find Pradu's source "unreadable" with all the conditional compiles, i disassemled the generated assembly of MINIMIZE_MAGIC ;-)

Code: Select all
u64 rookMask[64];
u64 rookMagig[64];
u32 rookShift[64];
u64*rookAttackBaseptr[64]; // 64 pointer for each square inside the rookAttacks-array
u64 rookAttacks[102400];   // 800KByte

u64 getRookAttacks (u64 occupied, u32 sq) {
  u64 pattern =  occupied & rookMask [sq];
  u32 address =  (pattern * rookMagig[sq]) >> rookShift[sq]; // 9..12 bit address
  u64 attacks = *(rookAttackBaseptr[sq] + address);
  return attacks;
}

One and/mul/shift, four cheap reads, one expensive one - with an additional indirection and memory access shrinking down to ~240KByte (not identical with PERFECT_MAGIC_HASH):
Code: Select all
u64 rookMask[64];
u64 rookMagig[64];
u32 rookShift[64];
u16*rookAttackNumberPtr[64];  // 64 pointer for each square inside the rookAttackNumber-array
u16 rookAttackNumber[102400]; // 200K
u64 rookAttacks[4900];

u64 getRookAttacks (u64 occupied, u32 sq) {
  u64 pattern =  occupied & rookMask [sq];
  u32 address =  (pattern * rookMagig[sq]) >> rookShift[sq]; // 9..12 bit address
  u32 number  = *(rookAttackNumberPtr[sq] + address);
  return rookAttacks[number];
}

Getting rook-attacks with the 9.5KByte (4.5K for rooks) approach takes 14 instructions (including one imul) with parallel gain and two cache friendly reads.

Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: New bitboard move generator

Postby Michael Sherwin » 22 Sep 2006, 02:39

Hi Gerd and everyone,

Well at least the end result of this exercise was to better understand Pradu's code. I can see that my method can not compete with his. I am going to test a couple different permutations anyway just to see how far off I was. However, unless someone has a different opinion, then I imagine that this topic is finished.

EDIT: Unless there is some merit to this idea! :D

Code: Select all
blockers = queenBits[sq] & occupied;
index = hashPtr.low ^ squareHash32[sq] & MASK;
if(moveBitsHash[index].blockers == blockers) {
  moveBits = moveBitsHash[index].moveBits ^ friendlies;
} else {
  generate moveBits as normal
  generate mobility
  store both in hash
}


hashPtr.low is the low 32 bits of the normal TT.

Sorry, couldn't resist! :twisted:

Thanks all,
Mike
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: New bitboard move generator

Postby Michael Sherwin » 29 Sep 2006, 06:23

Michael Sherwin wrote:I am starting a new topic, because, I should not have intruded into the Compact Bitboards thread with this new idea.

The basic idea as I have come to understand it is this:
Code: Select all
blockers.union = queenBits[sq];
signature = row1hash[blockers.row1]
          ^ row2hash[blockers.row2]
          ^ row3hash[blockers.row3]
          ^ row4hash[blockers.row4]
          ^ row5hash[blockers.row5]
          ^ row6hash[blockers.row6]
          ^ row7hash[blockers.row7]
          ^ row8hash[blockers.row8];
key = signature & MASK;
if(hashTbl[key].signature == signature) {
  moveBits = hashTbl[key].bits ^ friendlies;
} else {
  // generate using classic non rotated bitboards
  // or Gerds compact 1.5kb bitboards
  // get mobility
  // and store both in the hash for future use
}


The idea behind this is that it uses mostly 32bit values, has no shifts or multiply instructions, smaller dependency chains, one lookup to get all bits (even for queens), only one dimentional indexing and requires a relativly small hash table.

The downside of not finding the correct bitboard in the hash is minimized by the fact that it should not happen that often. This is because, most blocking combinations are not possible at any given time. As the game progresses different blocking combinations then become possible and old ones become not possible and the changes to the hash is gradual. In the end game the change should become negligible or non existant.

An improvement would be that if a perfect (no collisions) hash of reasonable size can be generated then the above given downside disapears. No signature is needed and the end result would be even faster!

An alternative method would be:

Code: Select all
moveBits = bishopBits[sq];
blockers.u = moveBits & allPieces & noEdge;
moveBits &= possibleBits2[sq][blockers.row2];
moveBits &= possibleBits3[sq][blockers.row3];
moveBits &= possibleBits4[sq][blockers.row4];
moveBits &= possibleBits5[sq][blockers.row5];
moveBits &= possibleBits6[sq][blockers.row6];
moveBits &= possibleBits7[sq][blockers.row7];
moveBits ^= friendlies;


No hash table needed. Some advantages gone. May be quite all right for 64bit machines. Can be modified to just get the queen bits and then modify the result for the rook or bishop (Zach Wegner).

Well thats it as the idea stands now. I am going to make some test and then report back.

Mike

Edit: I added the row numbers at the end of each possibleBits array (also per Zach Wegner). Also Zach said that the hash method could just return the bits for the queen and then modified for rook or bishop as well.


Okay, I have done a very small first test using my original suggestion. I created a 20 bit hash in which to store queen bit sets into for latter lookup. Also stored as a signature is the blockers set, as that is more accurate than the 32 bit hash code. OTOH storing the 32 bit code would be faster and use less memory if it does not cause errors. I will try this latter. I call it a very small first test because, it was only done in the QSearch and only for the white queen. So any time difference good or bad should be minimal.

The results for the initial position are as follows:
Time before modification for a 14 ply search 36.922 seconds.
Time after modification for a 14 ply search 36.422 seconds.
hash misses 323,411
hash hits 9,159,899

I would have posted the results for a more complete test, however, I am out of time untill next week.
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Previous

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 36 guests