Winboard Forum

Posted: **14 Jun 2006, 15:05**

Hi!

I read some of the posts here about deBruijn key hashing in bitboards, and decided to make a new bitboard engine without rotated boards.
First, my results using perft and comparing to crafty 20.1

My platform is Fedora FC5 x86_64 on an AMD64 3000+ 512k L2, 1Gbyte ram.

crafty: Starting position perft 6 uses 11.16s, mine uses 7.43s, 50%faster

Int the well known position "r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w QKqk -\n"
crafty uses 17.57s on perft 5. Mine uses 11.12s, 58% faster.

For fun i tested this "q3k3/6q1/4q3/2q5/5Q2/3Q4/1Q6/3K3Q w - -\n".
Crafty perft 5, 150.36s, mine 91.27s, 64.7% faster.
(yes, I have tested the number of moves, the move generator should be error free).

I made two tables: Attacksrook[64][8192] and AttacksBishop[64][8192] and
found a hash key (deBruijn key ?) for each square of the tables, so that
Pattern=RookMask[square]&(AllWPieces | AllBPieces);
Address= (Pattern*key[square])>>(64-13);
Attacks = AttacksRook[square][Address];

This reduses the number of instructions to make the combined rank+file attacks, by cost of 8Mbyte and cache pollution. Its faster for me, I used rotated bitboards till now

I dont expect my code to be optimal, I use no assembly inlined (actually I find deBruijn bitscanforward to be faster than the bsfq assembly instruction, even as I use H8 as bit 0 to avoid subtraction as used in crafty inlineamd.h).

If someone wants the source code give a message.(It was much easier to make than the rotated bitboards code).

Regards,
Lasse Hansen

Posted: **14 Jun 2006, 17:50**

Hi, Lasse.

I'm interested in bitboard move generator.
Thanks.

Posted: **14 Jun 2006, 18:38**

by all means post it. Or if you prefer, send it to me via email. I've been intending to look at this approach myself. Although move generation is not the only thing the rotated bitboards provide. For example, I get mobility numbers via a table look up s well (rather than a popcnt(x) where x is the bitmap of squares a piece attacks).

But it would be interesting to see if completely eliminating rotated bitboards would be cost effective for me...

Posted: **14 Jun 2006, 19:23**

Hi Lasse,

great news - despite 8MByte is quite huge ;-)

Did you use a brute force approach to find the magics? Enumerating all possible 64-bit De Bruijn numbers until you found a hit - unambiguous result for all relevant enumareted occupied sets on the two rays? Did you eventually tried Tony's approach with "random" numbers?

This is really the absolutly contrary approach to my 1.5KByte lookup approach to make rotated occupied bitboards obsolete, see the huge "compact bitboard attack"-thread here or as a conclusion in CCC:

http://www.talkchess.com/forum/viewtopi ... ght=#17161

For move-generation you still have to serialize the bitboards in the "classical" bitscan loop manner - i guess. This is where i intend to use de Bruijn hashing:

http://www.stmintz.com/ccc/index.php?id=487844
http://www.stmintz.com/ccc/index.php?id=488687

Good luck and fun with your approach!

Cheers,
Gerd

Posted: **14 Jun 2006, 19:25**

(for source, see later post)

Posted: **14 Jun 2006, 19:33**

(for source, see later post)

Posted: **14 Jun 2006, 19:46**

Gerd Isenberg wrote:Hi Lasse,

great news - despite 8MByte is quite huge
Did you use a brute force approach to find the magics? Enumerating all possible 64-bit De Bruijn numbers until you found a hit - unambiguous result for all relevant enumareted occupied sets on the two rays?

Hi, yes, I did a brute force seach to find the number. I used Random64() numbers as in crafty. I wont post this code, its just an optimised version of the code I use for initialisation (see init.c). I have already found keys for a 4096 bishop table (these are in the code). I was surprised to see that this was a bit slower! (maybe it have something to do with AMD's memory paging, I've heard it can read two 64bit words at once, if they are paged differently).

Good luck and fun with your approach!
Cheers,
Gerd

Thanks

Maybe I will have an engine running at ICC in a while.

Correct my if I'm wrong, but as I can see, a rook does max have 12 different attack patterns for a rank or file, giving 144 max totally for both for each square. So I believe I have mapped those 144 pattern around 8192 entries. I therefore believe this can be optimised pretty much.

Lasse[/quote]

Posted: **14 Jun 2006, 20:10**

Correct my if I'm wrong, but as I can see, a rook does max have 12 different attack patterns for a rank or file, giving 144 max totally for both for each square. So I believe I have mapped those 144 pattern around 8192 entries. I therefore believe this can be optimised pretty much.

Yes, up to 12 is max for the inner squares. Otherwise 7 + 6 + 10 + 12 + 12 + 10 + 6 + 7 = 70 different attack sets for all eight squares on a ray.

Posted: **14 Jun 2006, 20:17**

Hi Mateusz,
good luck in Dortmund today!
I live only 20km away, but i'm not a huge soccer fan.

Cheers,
Gerd

Posted: **14 Jun 2006, 20:49**

Great experiment.
But one should run other benchmarks than perft. The cache polution you mentioned will have a much higher impact during real search.
During perft you will have very view memory access besides your lookup tables.
During real search one will have hashtable access, history table updates, killer updates and so on.

Whether the new movegen will be faster under this conditions yet has to be proved.

Regards,

Klaus

Posted: **14 Jun 2006, 23:37**

can you either stick this on an ftp machine somewhere or zip it up and email it? cutting/pasting introduces a number of errors...

Posted: **15 Jun 2006, 00:31**

bob wrote:can you either stick this on an ftp machine somewhere or zip it up and email it? cutting/pasting introduces a number of errors...

Shure, just give me a place or an email address (I dont have a place myself).

[Edit] The following is wrong.

Btw, I made some other tables that takes care of PopCount(), but I made them only to work for the already made attack bitmaps. Those tables are at least smaller (char [64][256]) It is tested by comparing to PopCount() of the attacks in the move generator for all sliding pieces, when using perft. I'ts uncertain, but I got around 0.2% reduction in speed when always getting sliding piece mobility (of course without comparing to PopCount at the same time).

Here is the essential code for getting the rook mobility hash keys.

Regards, Lasse

Code: Select all: //-------------------------------------------------------------------------- // Find keys to get rook mobility //-------------------------------------------------------------------------- fprintf(HashCodeFile, "#Hash codes for rook mobility 256 elements\n"); for (sqr=H8; sqr<=A1; sqr++) { Print(3,"Searching hash code for rook mobility sqr %c%c\n",'H'-(sqr>>3),'8'-(sqr&7)); N=0; Best=0; do { for (i=0;i<256;i++) RookMobility[sqr][i]=0; MagicNr=Random64(); success=true; n=0; for (i=0;i<64*64;i++) { Pattern=AttacksFile[sqr][i&63] | AttacksRank[sqr][i>>6]; Address=(MagicNr*Pattern)>>(64-8); if (RookMobility[sqr][Address]) { if (RookMobility[sqr][Address]!=PopCount(Pattern)){ success=false; if (n>Best) { Best=n; Print(3,"Best=%3d N=%d\n",Best,N); } break; } } else { n++; RookMobility[sqr][Address] = PopCount(Pattern); } N++; if (!(N&((1<<24)-1))) Print(3,"."); } } while (!success); Print(3,"n=%d Magic number found: 0x%x%x \t:", n,HIDWORD(MagicNr),LODWORD(MagicNr)); Print(3,"Iterations = %x%x\n",HIDWORD(N),LODWORD(N)); HashKeyRookMobility[sqr]=MagicNr; xtos64(htxt,HashKeyRookMobility[sqr]); fprintf(HashCodeFile, "%c%c = %s\n",'H'-(sqr>>3),'8'-(sqr&7),htxt); fflush(HashCodeFile); }

Posted: **15 Jun 2006, 01:54**

I tested the pasted code, it compiled ok (get movgen.c from the second post, something went wrong in the first one). I use this command line:

gcc chess.c -Wall -O6 -march=k8 -mtune=k8 \
-fomit-frame-pointer -pipe \
-finline-limit=1200

Lasse

Posted: **15 Jun 2006, 09:58**

One possible improvement might be to get rid of the redundant outer square occupied state while calculating your occupied pattern. Simply mask out the outer squares of each rank/file or diagonal/antidiagonal in your precalculated RookMask or BishopMask arrays. With this reduced cardinality of "Pattern" it might be possible to find magics for more dense lookup arrays, may be 1024 for bishops and/or 2048 for rooks per square.

The advantage is, if you have a rook on d4, occupied changes on a4,h4,d1 and d8 don't translate to a different address with the chance to get some more cache hits here and there. Otoh i think the absolute sizeof the array if already exceeded some threshold is probably not that important.

I suggest no longer to post that huge source code here. Some routines, to get the idea is fine - imho, for instance the attack getters. But a whole program with a complete bitboard infrastructure is somehow problematic - even if it is perft only.
There are possible clone issues. Do you intend to make open source under some public license?

Gerd

Posted: **15 Jun 2006, 10:43**

Gerd Iseberg wrote:

One possible improvement might be to get rid of the redundant outer square occupied state while calculating your occupied pattern. Simply mask out the outer squares of each rank/file or diagonal/antidiagonal in your precalculated RookMask or BishopMask arrays. With this reduced cardinality of "Pattern" it might be possible to find magics for more dense lookup arrays, may be 1024 for bishops and/or 2048 for rooks per square.

I have already masked out the border squares, that why I search 64x64 patterns, not 256x256.

I suggest no longer to post that huge source code here. Some routines, to get the idea is fine - imho, for instance the attack getters. But a whole program with a complete bitboard infrastructure is somehow problematic - even if it is perft only.

I agree. I'm waiting for my ISP to give me a home page. When it works, I can replace the code posts with a link.

There are possible clone issues. Do you intend to make open source under some public license?

I want the source code to be under GPL. I guess I have to find and add some headers.

About mobility calculation, my thinking was very wrong ( think 8-)

. I might get back to that later.

Lasse

Posted: **16 Jun 2006, 16:03**

The source code can now be downloaded zipped at http://home.c2i.net/lasha

In my source code I have used some of the code i have found on this site, some I have taken from crafty. I hope this is ok. I want my part of the code to be as free as possible (I dont know if that apply to GPL?).

Lasse

Posted: **16 Jun 2006, 17:18**

It is certainly OK from my perspective...

Posted: **16 Jun 2006, 20:46**

Hi again!

Now I got the slider mobility tables working. I dont know how you define mobility, but what I got working is to lookup the number of non-capturing moves for rooks and bishops. Queens are calculated as the sum of the previous.

One 'problem' was that one can not use the 256->64 reduction trick, because it is now relevant if it is a piece on the board edge. But, one can still reduce to 128 by ignoring the focused piece.

Here a program snip:

Code: Select all: Pcs=WQueens; // WQueens while (Pcs) { From=FirstOne(Pcs); Pattern = (AllWPieces | AllBPieces) & MaskRook[From]; Address = (Pattern*HashKeyRook[From])>>(64-RAT_BITS); A = RookAttacks[From][Address]; Pattern = (AllWPieces | AllBPieces) & MaskBishop[From]; Address = (Pattern*HashKeyBishop[From])>>(64-BAT_BITS); A |= BishopAttacks[From][Address]; A ^= A & (AllWPieces | AllBPieces); // Mobility Pattern = A & MaskFreeRook[From]; Address = (Pattern*HashKeyRookMobility[From])>>(64-RMT_BITS); N2+=RookMobility[From][Address]; Pattern = A & MaskFreeBishop[From]; Address = (Pattern*HashKeyBishopMobility[From])>>(64-BMT_BITS); N2+=BishopMobility[From][Address]; while (A) { To=FirstOne(A); *M++=From+(To<<6)+(WQueen<<12); A &= ClrBit(To); } Pcs &= ClrBit(From); }

I have tested against PopCount() as I found on this site (a Reinhard Scharnagl post). I just made an aggregated sum of all sliding piece non-capturing moves. I used this position 5 plies deep
"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w QKqk -\n"
Compared to no mobility calculation the reduction in speed was 1.63% for Reinhard Scharnagl's version, and 0.145% for mine. So it may not be worth it by means of memory usage and such.

For now I use two tables RookMobility[64][2048] and BishopMobility[64][2048] (256k). The good thing is that it has byte size, and it can be compressed to half-byte, if wanted.

One maybe important thing I have'nt done yet, is to make an optimiser for finding the keys. I believe this can be done. If one have a 'good' but not good enough key, it should'nt be necessary to use a totally different new key (the brute force approach). I believe a method calles 'Simulated Annealing' could be of help here, by randomly changing only parts of the 64-bit key by xor-ing with a randomly shifted 32 bit/16 bit/8 bit etc random number.

The new code can be downloaded at http://home.c2i.net/lasha
Cheers,
Lasse

Posted: **16 Jun 2006, 22:48**

For now I use two tables RookMobility[64][2048] and BishopMobility[64][2048] (256k). The good thing is that it has byte size, and it can be compressed to half-byte, if wanted.

Hi Lasse,
great compression. I wonder whether the bishops have no lower range as in most of my experiments with rook and bishop moves or target sets. I guess you do a byte lookup to get an occupied pattern-index in the 12*12 range - and you need a second cascaded lookup[64][144] or so, to finally get the attack bitboards? How can you pack 12*12 into one nibble if you like to compress to half bytes?

Thanks,
Gerd

Posted: **17 Jun 2006, 02:08**

The RookMobility[64][2048] always return a value in the range [0, 14]=15 elements, that tells how many non-capturing moves it is allowed to do. It does not return an attack set. It is just a replacement for PopCount for all possible rook attack sets that are non-captures.

If I think correct the attack set for non-captures for a rook at D4 will not be 12x12=144, but (3+1)*(4+1)**2 = 400, because of the option of no attack at all.

Lasse

Winboard Forum

Fast(er) bitboard move generator

Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator

Re: Fast(er) bitboard move generator