A Faster Magic Move Bitboard Generator?

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

A Faster Magic Move Bitboard Generator?

Postby Grant Osborne » 15 Dec 2006, 01:11

Hi all

For those of you that are using Pradu's magic move bitboard generator (or understand how it works) and have a 32-bit computer, you'll be using code that looks something like this to retrieve your attackboards:-

Code: Select all
U64 Bmagic(unsigned int square) {
      return *(bIndecies[square] + (((Occupied & bishopMask[square]) * bishopMagic[square]) >> bishopShift[square]));
   }

   U64 Rmagic(unsigned int square) {   
      return *(rIndecies[square] + (((Occupied & rookMask[square]) * rookMagic[square]) >> rookShift[square]));
   }


I picked up on a recent post by H.G Muller and substituted the code above with:-

Code: Select all
U64 Bmagic(unsigned int square) {
      unsigned int lo = bishopMagicLo[square] * (int) Occupied;
      unsigned int hi = bishopMagicHi[square] * (int) (Occupied>>32);
      return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square]));
   }

   U64 Rmagic(unsigned int square) {
      unsigned int lo = rookMagicLo[square] * (int) Occupied;
      unsigned int hi = rookMagicHi[square] * (int) (Occupied>>32);
      return *(rIndecies[square] + ((lo ^ hi) >> rookShift[square]));
   }


This modification requires a new set of two 32-bit keys per square (or like me you could join them into one 64-bit key) and only uses two 32-bit multiplies instead of three, and a 32-bit shift, and is probably more suited to a 32-bit machine.
I have implemented this into my program and, for me, I got an increased perft speed of 1.5 million nps.

My thanks to H.G Muller

Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: A Faster Magic Move Bitboard Generator?

Postby smcracraft » 15 Dec 2006, 03:43

H.G., Grant, or Pradu - for this apparent breakthru, please
publish a magicmoves.c, magicmoves.h so that others
can gain as well.
smcracraft
 
Posts: 65
Joined: 15 Jan 2006, 05:38

Re: A Faster Magic Move Bitboard Generator?

Postby Grant Osborne » 15 Dec 2006, 11:14

Hi

I would stress that I have merely made a modification to Pradu's excellent code and following up on an idea by H.G., and rather than re-publish, I'll just show what I changed to make it faster for me.

Changes to magicmoves.h

Code: Select all
U64 Bdb[4911];
U64 Rdb[101964];
const U64 bishopMagic[64] = {
   0x080600D0080E5F02, 0x4732008400AF0FD2, 0x82CFF71F62045123, 0x0000000000082481,
   0x0000000000021210, 0xF7D118CCA75DCB7F, 0xB18910070003FF48, 0xC45B7FA7670A6AE7,
   0x42661048BE92AF33, 0xA080928714954FAC, 0x95044F804D4D84C7, 0xDE983BDD629E420B,
   0x99D216A3B099B245, 0xC2E8076ABDFB746B, 0x8C0388A586C0F533, 0xC6224289FCBCD2A1,
   0xF4EBD610AE88F586, 0xA539F530EF057C82, 0xB634161080018408, 0x2502300048022104,
   0x2028218080022001, 0x41204160D4921201, 0x0151E44209184002, 0xA0D4EA5220102112,
   0x29422204008B4210, 0x1091B90048016862, 0x0890684020024050, 0x8504020280100C80,
   0x0308104208210040, 0x48818096F4022418, 0x2129F42C80108098, 0x009509F100D0B027,
   0x80220B0212022504, 0x2062E50F00408215, 0x0010402F20402498, 0x2014404088148808,
   0x0020200E0084008A, 0x2CAA207000416121, 0x0044A4EE00E600A1, 0x0031189A1010814A,
   0x890604183F7A33CF, 0x4443220486820301, 0x020004431CF084C4, 0x422060404E200061,
   0x010020B200488821, 0x102001008B1405D8, 0xA4159629BD0CE70E, 0xF240118110240084,
   0xB433122800008421, 0x091A2C4000AB0041, 0x440C22240C082446, 0x0605C08000410501,
   0x056102702101005E, 0x5BD5C00F4202B046, 0x840BB49112E20421, 0x0B2081001902024A,
   0x21D6F2C040020121, 0x07A333113508420D, 0x006C0B6048220315, 0x06104C0800000000,
   0x3097434700000000, 0x3F34F24A15A98871, 0x8EA38EC007404872, 0x1FFF966AC3E87030
};
const U64 rookMagic[64] = {
   0x05FACEB5A2001083, 0x4145280000081802, 0x200220C52A042010, 0x7080300C11011010,
   0x0041260080412802, 0x287001888040020C, 0x43400E0044080111, 0x87200F80100800F1,
   0x82C00024A0020102, 0x0CB000210800804A, 0x0010402000210120, 0x0084180040008808,
   0xAB00400480004205, 0x03D008222061004C, 0x8484350080010103, 0x60018DA801008101,
   0xE08100F590204040, 0x2F48483800202040, 0x844E0028A0808401, 0x0804403208220204,
   0x00700B4001A40101, 0x2954018820200202, 0x404B000B20008041, 0x900F408040008101,
   0x0080005054902D41, 0x48841520020040B1, 0x620A40F260072082, 0x0F4A2006C2001602,
   0x0A160004A4300A22, 0x80840412104C0B01, 0x0012828310100B04, 0x010160940100006E,
   0x1010204B4412A008, 0x120100802B080852, 0x8C010040C210A022, 0x8808008882B12038,
   0x11000B0405391128, 0x0100080282110085, 0x8100190060840502, 0x10200840C0316601,
   0x91B0040001800248, 0x12D0808840002012, 0x08C10010010B0030, 0x1180404011290886,
   0x014404000060208F, 0x400A000518C84292, 0x403A000C0002B512, 0x040040406A08E081,
   0x80084040404C0030, 0x0AA40010084016A0, 0x0018082039002250, 0x0643880200803088,
   0x0202404004189006, 0x4101802000002C2B, 0x0810501058503081, 0xE41108040808A007,
   0x001B1041000028C0, 0x91004182C4042813, 0x000F416A04001022, 0x400616325240200A,
   0xD00EC5A27E001002, 0x1008010E00101802, 0x01402401001800E3, 0x090026AF00200061
};
U64* bIndecies[64] = {
   Bdb+4783, Bdb+2505, Bdb+236, Bdb+857,  Bdb+1234, Bdb+1612, Bdb+4639, Bdb+4847,
   Bdb+2449, Bdb+2521, Bdb+264, Bdb+889,  Bdb+1266, Bdb+1628, Bdb+4655, Bdb+4727,
   Bdb+0,      Bdb+116,  Bdb+292, Bdb+917,  Bdb+1296, Bdb+1656, Bdb+2208, Bdb+2328,
   Bdb+28,   Bdb+144,  Bdb+418, Bdb+2593, Bdb+3616, Bdb+1783, Bdb+2238, Bdb+2358,
   Bdb+58,   Bdb+174,  Bdb+544, Bdb+3104, Bdb+4127, Bdb+1909, Bdb+2266, Bdb+2386,
   Bdb+88,   Bdb+204,  Bdb+671, Bdb+1045, Bdb+1424, Bdb+2036, Bdb+2296, Bdb+2417,
   Bdb+2477, Bdb+2549, Bdb+799, Bdb+1173, Bdb+1552, Bdb+2164, Bdb+4683, Bdb+4755,
   Bdb+4815, Bdb+2577, Bdb+828, Bdb+1202, Bdb+1581, Bdb+2192, Bdb+4711, Bdb+4879
};
U64* rIndecies[64] = {
   Rdb+85598, Rdb+73458, Rdb+36725, Rdb+42852, Rdb+46940, Rdb+51031, Rdb+77418, Rdb+93782,
   Rdb+69366, Rdb+32652, Rdb+38772, Rdb+10231, Rdb+14306, Rdb+53071, Rdb+57097, Rdb+81506,
   Rdb+24462, Rdb+33672, Rdb+6140,  Rdb+11254, Rdb+15266, Rdb+18320, Rdb+58119, Rdb+61174,
   Rdb+26510, Rdb+4094,  Rdb+7163,  Rdb+0,     Rdb+2047,  Rdb+19344, Rdb+22415, Rdb+63222,
   Rdb+28558, Rdb+5117,  Rdb+8187,  Rdb+1023,  Rdb+3070,  Rdb+20367, Rdb+23438, Rdb+65270,
   Rdb+30606, Rdb+34695, Rdb+9211,  Rdb+12277, Rdb+16290, Rdb+21391, Rdb+59142, Rdb+67318,
   Rdb+71412, Rdb+35717, Rdb+39792, Rdb+13284, Rdb+17312, Rdb+54093, Rdb+60166, Rdb+83554,
   Rdb+89694, Rdb+75378, Rdb+40812, Rdb+44896, Rdb+48987, Rdb+55053, Rdb+79462, Rdb+97869
};
const unsigned int bishopShift[64] = {
   27, 28, 27, 27, 27, 28, 28, 27,
   27, 27, 27, 27, 27, 27, 27, 27,
   27, 27, 25, 25, 25, 25, 27, 27,
   27, 27, 25, 23, 23, 25, 27, 27,
   27, 27, 25, 23, 23, 25, 27, 27,
   27, 27, 25, 25, 25, 25, 27, 27,
   27, 27, 27, 27, 27, 27, 27, 27,
   27, 28, 27, 27, 27, 28, 28, 27
};
const unsigned int rookShift[64] = {
   20, 21, 21, 21, 21, 21, 21, 20,
   21, 22, 22, 22, 22, 22, 22, 21,
   21, 22, 22, 22, 22, 22, 22, 21,
   21, 22, 22, 22, 22, 22, 22, 21,
   21, 22, 22, 22, 22, 22, 22, 21,
   21, 22, 22, 22, 22, 22, 22, 21,
   21, 22, 22, 22, 22, 22, 22, 21,
   20, 21, 21, 21, 21, 21, 21, 20
};



Changes to magicmoves.c

Code: Select all
U64 Bmagic(unsigned int square) {
   U64 occ = Occupied & bishopMask[square];
   unsigned int lo = (int)(occ) * (int)(bishopMagic[square]);
   unsigned int hi = (int)(occ >> 32) * (int)(bishopMagic[square] >> 32);
   return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square]));
}

U64 Rmagic(unsigned int square) {
   U64 occ = Occupied & rookMask[square];
   unsigned int lo = (int)(occ) * (int)(rookMagic[square]);
   unsigned int hi = (int)(occ >> 32) * (int)(rookMagic[square] >> 32);
   return *(rIndecies[square] + ((lo ^ hi) >> rookShift[square]));
}


Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: A Faster Magic Move Bitboard Generator?

Postby Tord Romstad » 15 Dec 2006, 15:47

Grant Osborne wrote:This modification requires a new set of two 32-bit keys per square (or like me you could join them into one 64-bit key) and only uses two 32-bit multiplies instead of three, and a 32-bit shift, and is probably more suited to a 32-bit machine.
I have implemented this into my program and, for me, I got an increased perft speed of 1.5 million nps.


Hello Grant,

What CPU were you using? With my program, I get a speedup of about 20% with 32-bit multiplications on an AMD64 X2 (running in 32-bit mode), but on a 32-bit Intel Core Duo, both variants run at exactly the same speed (which is surprising to me).

Tord
User avatar
Tord Romstad
 
Posts: 639
Joined: 09 Oct 2004, 12:49
Location: Oslo, Norway

Re: A Faster Magic Move Bitboard Generator?

Postby smcracraft » 15 Dec 2006, 16:33

Grant Osborne wrote:Hi

I would stress that I have merely made a modification to Pradu's excellent code and following up on an idea by H.G., and rather than re-publish, I'll just show what I changed to make it faster for me.

Changes to magicmoves.h

Code: Select all
U64 Bdb[4911];
U64 Rdb[101964];
const U64 bishopMagic[64] = {
   0x080600D0080E5F02, 0x4732008400AF0FD2, 0x82CFF71F62045123, 0x0000000000082481,
   0x0000000000021210, 0xF7D118CCA75DCB7F, 0xB18910070003FF48, 0xC45B7FA7670A6AE7,
   0x42661048BE92AF33, 0xA080928714954FAC, 0x95044F804D4D84C7, 0xDE983BDD629E420B,
   0x99D216A3B099B245, 0xC2E8076ABDFB746B, 0x8C0388A586C0F533, 0xC6224289FCBCD2A1,
   0xF4EBD610AE88F586, 0xA539F530EF057C82, 0xB634161080018408, 0x2502300048022104,
   0x2028218080022001, 0x41204160D4921201, 0x0151E44209184002, 0xA0D4EA5220102112,
   0x29422204008B4210, 0x1091B90048016862, 0x0890684020024050, 0x8504020280100C80,
   0x0308104208210040, 0x48818096F4022418, 0x2129F42C80108098, 0x009509F100D0B027,
   0x80220B0212022504, 0x2062E50F00408215, 0x0010402F20402498, 0x2014404088148808,
   0x0020200E0084008A, 0x2CAA207000416121, 0x0044A4EE00E600A1, 0x0031189A1010814A,
   0x890604183F7A33CF, 0x4443220486820301, 0x020004431CF084C4, 0x422060404E200061,
   0x010020B200488821, 0x102001008B1405D8, 0xA4159629BD0CE70E, 0xF240118110240084,
   0xB433122800008421, 0x091A2C4000AB0041, 0x440C22240C082446, 0x0605C08000410501,
   0x056102702101005E, 0x5BD5C00F4202B046, 0x840BB49112E20421, 0x0B2081001902024A,
   0x21D6F2C040020121, 0x07A333113508420D, 0x006C0B6048220315, 0x06104C0800000000,
   0x3097434700000000, 0x3F34F24A15A98871, 0x8EA38EC007404872, 0x1FFF966AC3E87030
};
const U64 rookMagic[64] = {
   0x05FACEB5A2001083, 0x4145280000081802, 0x200220C52A042010, 0x7080300C11011010,
   0x0041260080412802, 0x287001888040020C, 0x43400E0044080111, 0x87200F80100800F1,
   0x82C00024A0020102, 0x0CB000210800804A, 0x0010402000210120, 0x0084180040008808,
   0xAB00400480004205, 0x03D008222061004C, 0x8484350080010103, 0x60018DA801008101,
   0xE08100F590204040, 0x2F48483800202040, 0x844E0028A0808401, 0x0804403208220204,
   0x00700B4001A40101, 0x2954018820200202, 0x404B000B20008041, 0x900F408040008101,
   0x0080005054902D41, 0x48841520020040B1, 0x620A40F260072082, 0x0F4A2006C2001602,
   0x0A160004A4300A22, 0x80840412104C0B01, 0x0012828310100B04, 0x010160940100006E,
   0x1010204B4412A008, 0x120100802B080852, 0x8C010040C210A022, 0x8808008882B12038,
   0x11000B0405391128, 0x0100080282110085, 0x8100190060840502, 0x10200840C0316601,
   0x91B0040001800248, 0x12D0808840002012, 0x08C10010010B0030, 0x1180404011290886,
   0x014404000060208F, 0x400A000518C84292, 0x403A000C0002B512, 0x040040406A08E081,
   0x80084040404C0030, 0x0AA40010084016A0, 0x0018082039002250, 0x0643880200803088,
   0x0202404004189006, 0x4101802000002C2B, 0x0810501058503081, 0xE41108040808A007,
   0x001B1041000028C0, 0x91004182C4042813, 0x000F416A04001022, 0x400616325240200A,
   0xD00EC5A27E001002, 0x1008010E00101802, 0x01402401001800E3, 0x090026AF00200061
};
U64* bIndecies[64] = {
   Bdb+4783, Bdb+2505, Bdb+236, Bdb+857,  Bdb+1234, Bdb+1612, Bdb+4639, Bdb+4847,
   Bdb+2449, Bdb+2521, Bdb+264, Bdb+889,  Bdb+1266, Bdb+1628, Bdb+4655, Bdb+4727,
   Bdb+0,      Bdb+116,  Bdb+292, Bdb+917,  Bdb+1296, Bdb+1656, Bdb+2208, Bdb+2328,
   Bdb+28,   Bdb+144,  Bdb+418, Bdb+2593, Bdb+3616, Bdb+1783, Bdb+2238, Bdb+2358,
   Bdb+58,   Bdb+174,  Bdb+544, Bdb+3104, Bdb+4127, Bdb+1909, Bdb+2266, Bdb+2386,
   Bdb+88,   Bdb+204,  Bdb+671, Bdb+1045, Bdb+1424, Bdb+2036, Bdb+2296, Bdb+2417,
   Bdb+2477, Bdb+2549, Bdb+799, Bdb+1173, Bdb+1552, Bdb+2164, Bdb+4683, Bdb+4755,
   Bdb+4815, Bdb+2577, Bdb+828, Bdb+1202, Bdb+1581, Bdb+2192, Bdb+4711, Bdb+4879
};
U64* rIndecies[64] = {
   Rdb+85598, Rdb+73458, Rdb+36725, Rdb+42852, Rdb+46940, Rdb+51031, Rdb+77418, Rdb+93782,
   Rdb+69366, Rdb+32652, Rdb+38772, Rdb+10231, Rdb+14306, Rdb+53071, Rdb+57097, Rdb+81506,
   Rdb+24462, Rdb+33672, Rdb+6140,  Rdb+11254, Rdb+15266, Rdb+18320, Rdb+58119, Rdb+61174,
   Rdb+26510, Rdb+4094,  Rdb+7163,  Rdb+0,     Rdb+2047,  Rdb+19344, Rdb+22415, Rdb+63222,
   Rdb+28558, Rdb+5117,  Rdb+8187,  Rdb+1023,  Rdb+3070,  Rdb+20367, Rdb+23438, Rdb+65270,
   Rdb+30606, Rdb+34695, Rdb+9211,  Rdb+12277, Rdb+16290, Rdb+21391, Rdb+59142, Rdb+67318,
   Rdb+71412, Rdb+35717, Rdb+39792, Rdb+13284, Rdb+17312, Rdb+54093, Rdb+60166, Rdb+83554,
   Rdb+89694, Rdb+75378, Rdb+40812, Rdb+44896, Rdb+48987, Rdb+55053, Rdb+79462, Rdb+97869
};
const unsigned int bishopShift[64] = {
   27, 28, 27, 27, 27, 28, 28, 27,
   27, 27, 27, 27, 27, 27, 27, 27,
   27, 27, 25, 25, 25, 25, 27, 27,
   27, 27, 25, 23, 23, 25, 27, 27,
   27, 27, 25, 23, 23, 25, 27, 27,
   27, 27, 25, 25, 25, 25, 27, 27,
   27, 27, 27, 27, 27, 27, 27, 27,
   27, 28, 27, 27, 27, 28, 28, 27
};
const unsigned int rookShift[64] = {
   20, 21, 21, 21, 21, 21, 21, 20,
   21, 22, 22, 22, 22, 22, 22, 21,
   21, 22, 22, 22, 22, 22, 22, 21,
   21, 22, 22, 22, 22, 22, 22, 21,
   21, 22, 22, 22, 22, 22, 22, 21,
   21, 22, 22, 22, 22, 22, 22, 21,
   21, 22, 22, 22, 22, 22, 22, 21,
   20, 21, 21, 21, 21, 21, 21, 20
};



Changes to magicmoves.c

Code: Select all
U64 Bmagic(unsigned int square) {
   U64 occ = Occupied & bishopMask[square];
   unsigned int lo = (int)(occ) * (int)(bishopMagic[square]);
   unsigned int hi = (int)(occ >> 32) * (int)(bishopMagic[square] >> 32);
   return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square]));
}

U64 Rmagic(unsigned int square) {
   U64 occ = Occupied & rookMask[square];
   unsigned int lo = (int)(occ) * (int)(rookMagic[square]);
   unsigned int hi = (int)(occ >> 32) * (int)(rookMagic[square] >> 32);
   return *(rIndecies[square] + ((lo ^ hi) >> rookShift[square]));
}


Grant


The code you sent doesn't appear at all like the magicmoves
I use so I don't know where to make the changes:

Code: Select all
/**
 *magicmoves.h
 *
 *Header file for magic move bitboard generation.  Include this in any files
 *need this functionality.
 *
 *Usage:
 *You must first initialize the generator with a call to initmagicmoves().
 *Then you can use the following macros for generating move bitboards by
 *giving them a square and an occupancy.  The macro will then "return"
 *the correct move bitboard for that particular square and occupancy. It
 *has been named Rmagic and Bmagic so that it will not conflict with
 *any functions/macros in your chess program called Rmoves/Bmoves. You
 *can macro Bmagic/Rmagic to Bmoves/Rmoves if you wish.  If you want to
 *minimize the size of the bitboards, make MINIMIZE_MAGIC uncommented in this
 *header (more info on this later).  Where you typedef your unsigned 64-bit
 *integer declare __64_BIT_INTEGER_DEFINED__.  If USE_INLINING is uncommented,
 *the macros will be expressed as inlined functions.  If PERFECT_MAGIC_HASH is
 *uncomment, the move generator will use an additional indrection to make the
 *table sizes smaller : (~50kb+((original size)/sizeof(PERFECT_MAGIC_HASH)).
 *The size listed from here on out are the sizes without PERFECT_MAGIC_HASH.
 *
 *Bmagic(square, occupancy)
 *Rmagic(square, occupancy)
 *
 *Square is an integer that is greater than or equal to zero and less than 64.
 *Occupancy is any unsigned 64-bit integer that describes which squares on
 *the board are occupied.
 *
 *The following macros are identical to Rmagic and Bmagic except that the
 *occupancy is assumed to already have been "masked".  Look at the following
 *source or read up on the internet about magic bitboard move generation to
 *understand the usage of these macros and what it means by "an occupancy that
 *has already been masked".  Using the following macros when possible might be
 *a tiny bit faster than using Rmagic and Bmagic because it avoids an array
 *access and a 64-bit & operation.
 *
 *BmagicNOMASK(square, occupancy)
 *RmagicNOMASK(square, occupancy)
 *
 *Unsigned 64 bit integers are referenced by this generator as U64.
 *Edit the beginning lines of this header for the defenition of a 64 bit
 *integer if necessary.
 *
 *If MINIMIZE_MAGIC is defined before including this file:
 *The move bitboard generator will use up 841kb of memory.
 *41kb of memory is used for the bishop database and 800kb is used for the rook
 *database.  If you feel the 800kb rook database is too big, then comment it out
 *and use a more traditional move bitboard generator in conjunction with the
 *magic move bitboard generator for bishops.
 *
 *If MINIMIAZE_MAGIC is not defined before including this file:
 *The move bitboard generator will use up 2304kb of memory but might perform a bit
 *faster.
 *
 *Copyright (C) 2006 Pradyumna Kannan.
 *
 *This code is provided 'as-is', without any express or implied warranty.
 *In no event will the authors be held liable for any damages arising from
 *the use of this code. Permission is granted to anyone to use this
 *code for any purpose, including commercial applications, and to alter
 *it and redistribute it freely, subject to the following restrictions:
 *
 *1. The origin of this code must not be misrepresented; you must not
 *claim that you wrote the original code. If you use this code in a
 *product, an acknowledgment in the product documentation would be
 *appreciated but is not required.
 *
 *2. Altered source versions must be plainly marked as such, and must not be
 *misrepresented as being the original code.
 *
 *3. This notice may not be removed or altered from any source distribution.
 */

#ifndef _magicmovesh
#define _magicmovesh

/*********MODIFY THE FOLLOWING IF NECESSARY********/

//Uncommont either one of the following or both
//#define MINIMIZE_MAGIC
//#define PERFECT_MAGIC_HASH unsigned short

#ifdef _MSC_VER
   #define USE_INLINING
#endif

#ifndef __64_BIT_INTEGER_DEFINED__
#define __64_BIT_INTEGER_DEFINED__
#if defined(_MSC_VER) && _MSC_VER<1300
   typedef unsigned __int64 U64; //For the old microsoft compilers
#else
   typedef unsigned long long  U64; //Supported by MSC 13.00+ and GCC
#endif //defined(_MSC_VER) && _MSC_VER<1300
#endif //__64_BIT_INTEGER_DEFINED__
/***********MODIFY THE ABOVE IF NECESSARY**********/

#ifdef USE_INLINING
   #ifdef _MSC_VER
      #define MAGICMOVES_INLINE_KEYWORD __forceinline
   #endif
   #ifdef __GNUC__
      #define MAGICMOVES_INLINE_KEYWORD __inline__
   #endif
   #ifndef MAGICMOVES_INLINE_KEYWORD
      #define MAGICMOVES_INLINE_KEYWORD inline
   #endif
#endif

extern const U64 magicmoves_r_magics[64];
extern const U64 magicmoves_r_mask[64];
extern const U64 magicmoves_b_magics[64];
extern const U64 magicmoves_b_mask[64];

#ifndef MINIMIZE_MAGIC
   #define MINIMAL_B_BITS_SHIFT 55
   #define MINIMAL_R_BITS_SHIFT 52
#endif

#ifndef PERFECT_MAGIC_HASH
   #ifdef MINIMIZE_MAGIC

      #ifndef USE_INLINING
         #define Bmagic(square, occupancy) *(magicmoves_b_indecies[square]+((((occupancy)&magicmoves_b_mask[square])*magicmoves_b_magics[square])>>magicmoves_b_shift[square]))
         #define Rmagic(square, occupancy) *(magicmoves_r_indecies[square]+((((occupancy)&magicmoves_r_mask[square])*magicmoves_r_magics[square])>>magicmoves_r_shift[square]))
         #define BmagicNOMASK(square, occupancy) *(magicmoves_b_indecies[square]+(((occupancy)*magicmoves_b_magics[square])>>magicmoves_b_shift[square]))
         #define RmagicNOMASK(square, occupancy) *(magicmoves_r_indecies[square]+(((occupancy)*magicmoves_r_magics[square])>>magicmoves_r_shift[square]))
      #endif //USE_INLINING

      //extern U64 magicmovesbdb[5248];
      extern const unsigned int magicmoves_b_shift[64];
      extern const U64* magicmoves_b_indecies[64];

      //extern U64 magicmovesrdb[102400];
      extern const unsigned int magicmoves_r_shift[64];
      extern const U64* magicmoves_r_indecies[64];

   #else //Don't Minimize database size

      #ifndef USE_INLINING
         #define Bmagic(square, occupancy) magicmovesbdb[square][(((occupancy)&magicmoves_b_mask[square])*magicmoves_b_magics[square])>>MINIMAL_B_BITS_SHIFT]
         #define Rmagic(square, occupancy) magicmovesrdb[square][(((occupancy)&magicmoves_r_mask[square])*magicmoves_r_magics[square])>>MINIMAL_R_BITS_SHIFT]
         #define BmagicNOMASK(square, occupancy) magicmovesbdb[square][((occupancy)*magicmoves_b_magics[square])>>MINIMAL_B_BITS_SHIFT]
         #define RmagicNOMASK(square, occupancy) magicmovesrdb[square][((occupancy)*magicmoves_r_magics[square])>>MINIMAL_R_BITS_SHIFT]
      #endif //USE_INLINING

      extern U64 magicmovesbdb[64][1<<9];
      extern U64 magicmovesrdb[64][1<<12];

   #endif //MINIMIAZE_MAGICMOVES
#else //PERFCT_MAGIC_HASH defined
   #ifndef MINIMIZE_MAGIC

      #ifndef USE_INLINING
         #define Bmagic(square, occupancy) magicmovesbdb[magicmoves_b_indecies[square][(((occupancy)&magicmoves_b_mask[square])*magicmoves_b_magics[square])>>MINIMAL_B_BITS_SHIFT]]
         #define Rmagic(square, occupancy) magicmovesrdb[magicmoves_r_indecies[square][(((occupancy)&magicmoves_r_mask[square])*magicmoves_r_magics[square])>>MINIMAL_R_BITS_SHIFT]]
         #define BmagicNOMASK(square, occupancy) magicmovesbdb[magicmoves_b_indecies[square][((occupancy)*magicmoves_b_magics[square])>>MINIMAL_B_BITS_SHIFT]]
         #define RmagicNOMASK(square, occupancy) magicmovesrdb[magicmoves_r_indecies[square][((occupancy)*magicmoves_r_magics[square])>>MINIMAL_R_BITS_SHIFT]]
      #endif //USE_INLINING

      extern U64 magicmovesbdb[1428];
      extern U64 magicmovesrdb[4900];
      extern PERFECT_MAGIC_HASH magicmoves_b_indecies[64][1<<9];
      extern PERFECT_MAGIC_HASH magicmoves_r_indecies[64][1<<12];
   #else
      #error magicmoves - MINIMIZED_MAGIC and PERFECT_MAGIC_HASH cannot be used together
   #endif
#endif //PERFCT_MAGIC_HASH

#ifdef USE_INLINING
   MAGICMOVES_INLINE_KEYWORD U64 Bmagic(const unsigned int square,const U64 occupancy)
   {
      #ifndef PERFECT_MAGIC_HASH
         #ifdef MINIMIZE_MAGIC
            return *(magicmoves_b_indecies[square]+(((occupancy&magicmoves_b_mask[square])*magicmoves_b_magics[square])>>magicmoves_b_shift[square]));
         #else
            return magicmovesbdb[square][(((occupancy)&magicmoves_b_mask[square])*magicmoves_b_magics[square])>>MINIMAL_B_BITS_SHIFT];
         #endif
      #else
         return magicmovesbdb[magicmoves_b_indecies[square][(((occupancy)&magicmoves_b_mask[square])*magicmoves_b_magics[square])>>MINIMAL_B_BITS_SHIFT]];
      #endif
   }
   MAGICMOVES_INLINE_KEYWORD U64 Rmagic(const unsigned int square,const U64 occupancy)
   {
      #ifndef PERFECT_MAGIC_HASH
         #ifdef MINIMIZE_MAGIC
            return *(magicmoves_r_indecies[square]+(((occupancy&magicmoves_r_mask[square])*magicmoves_r_magics[square])>>magicmoves_r_shift[square]));
         #else
            return magicmovesrdb[square][(((occupancy)&magicmoves_r_mask[square])*magicmoves_r_magics[square])>>MINIMAL_R_BITS_SHIFT];
         #endif
      #else
         return magicmovesrdb[magicmoves_r_indecies[square][(((occupancy)&magicmoves_r_mask[square])*magicmoves_r_magics[square])>>MINIMAL_R_BITS_SHIFT]];
      #endif
   }
   MAGICMOVES_INLINE_KEYWORD U64 BmagicNOMASK(const unsigned int square,const U64 occupancy)
   {
      #ifndef PERFECT_MAGIC_HASH
         #ifdef MINIMIZE_MAGIC
            return *(magicmoves_b_indecies[square]+(((occupancy)*magicmoves_b_magics[square])>>magicmoves_b_shift[square]));
         #else
            return magicmovesbdb[square][((occupancy)*magicmoves_b_magics[square])>>MINIMAL_B_BITS_SHIFT];
         #endif
      #else
         return magicmovesbdb[magicmoves_b_indecies[square][((occupancy)*magicmoves_b_magics[square])>>MINIMAL_B_BITS_SHIFT]];
      #endif
   }
   MAGICMOVES_INLINE_KEYWORD U64 RmagicNOMASK(const unsigned int square, const U64 occupancy)
   {
      #ifndef PERFECT_MAGIC_HASH
         #ifdef MINIMIZE_MAGIC
            return *(magicmoves_r_indecies[square]+(((occupancy)*magicmoves_r_magics[square])>>magicmoves_r_shift[square]));
         #else
            return magicmovesrdb[square][((occupancy)*magicmoves_r_magics[square])>>MINIMAL_R_BITS_SHIFT];
         #endif
      #else
         return magicmovesrdb[magicmoves_r_indecies[square][((occupancy)*magicmoves_r_magics[square])>>MINIMAL_R_BITS_SHIFT]];
      #endif
   }

   MAGICMOVES_INLINE_KEYWORD U64 Qmagic(const unsigned int square,const U64 occupancy)
   {
      return Bmagic(square,occupancy)|Rmagic(square,occupancy);
   }
   MAGICMOVES_INLINE_KEYWORD U64 QmagicNOMASK(const unsigned int square, const U64 occupancy)
   {
      return BmagicNOMASK(square,occupancy)|RmagicNOMASK(square,occupancy);
   }
#else //!USE_INLINING

#define Qmagic(square, occupancy) (Bmagic(square,occupancy)|Rmagic(square,occupancy))
#define QmagicNOMASK(square, occupancy) (BmagicNOMASK(square,occupancy)|RmagicNOMASK(square,occupancy))

#endif //USE_INLINING

void initmagicmoves(void);

#endif //_magicmoveshvesh

and magicmoves.c

/**
 *magicmoves.h
 *
 *Source file for magic move bitboard generation.
 *
 *See header file for usage.
 *
 *The magic keys are not optimal for all squares but they are very close
 *to optimal.
 *
 *Copyright (C) 2006 Pradyumna Kannan.
 *
 *This code is provided 'as-is', without any express or implied warranty.
 *In no event will the authors be held liable for any damages arising from
 *the use of this code. Permission is granted to anyone to use this
 *code for any purpose, including commercial applications, and to alter
 *it and redistribute it freely, subject to the following restrictions:
 *
 *1. The origin of this code must not be misrepresented; you must not
 *claim that you wrote the original code. If you use this code in a
 *product, an acknowledgment in the product documentation would be
 *appreciated but is not required.
 *
 *2. Altered source versions must be plainly marked as such, and must not be
 *misrepresented as being the original code.
 *
 *3. This notice may not be removed or altered from any source distribution.
 */

#include "magicmoves.h"

#ifdef _MSC_VER
#pragma message("MSC compiler detected -- turning off warning 4312,4146")
#pragma warning( disable : 4312)
#pragma warning( disable : 4146)
#endif

#ifdef MINIMIZE_MAGIC
U64 magicmovesbdb[5248];
const unsigned int magicmoves_b_shift[64] = {
  58, 59, 59, 59, 59, 59, 59, 58,
  59, 59, 59, 59, 59, 59, 59, 59,
  59, 59, 57, 57, 57, 57, 59, 59,
  59, 59, 57, 55, 55, 57, 59, 59,
  59, 59, 57, 55, 55, 57, 59, 59,
  59, 59, 57, 57, 57, 57, 59, 59,
  59, 59, 59, 59, 59, 59, 59, 59,
  58, 59, 59, 59, 59, 59, 59, 58
};
const U64 *magicmoves_b_indecies[64] = {
  magicmovesbdb + 4992, magicmovesbdb + 2624, magicmovesbdb + 256,
  magicmovesbdb + 896,
  magicmovesbdb + 1280, magicmovesbdb + 1664, magicmovesbdb + 4800,
  magicmovesbdb + 5120,
  magicmovesbdb + 2560, magicmovesbdb + 2656, magicmovesbdb + 288,
  magicmovesbdb + 928,
  magicmovesbdb + 1312, magicmovesbdb + 1696, magicmovesbdb + 4832,
  magicmovesbdb + 4928,
  magicmovesbdb + 0, magicmovesbdb + 128, magicmovesbdb + 320,
  magicmovesbdb + 960,
  magicmovesbdb + 1344, magicmovesbdb + 1728, magicmovesbdb + 2304,
  magicmovesbdb + 2432,
  magicmovesbdb + 32, magicmovesbdb + 160, magicmovesbdb + 448,
  magicmovesbdb + 2752,
  magicmovesbdb + 3776, magicmovesbdb + 1856, magicmovesbdb + 2336,
  magicmovesbdb + 2464,
  magicmovesbdb + 64, magicmovesbdb + 192, magicmovesbdb + 576,
  magicmovesbdb + 3264,
  magicmovesbdb + 4288, magicmovesbdb + 1984, magicmovesbdb + 2368,
  magicmovesbdb + 2496,
  magicmovesbdb + 96, magicmovesbdb + 224, magicmovesbdb + 704,
  magicmovesbdb + 1088,
  magicmovesbdb + 1472, magicmovesbdb + 2112, magicmovesbdb + 2400,
  magicmovesbdb + 2528,
  magicmovesbdb + 2592, magicmovesbdb + 2688, magicmovesbdb + 832,
  magicmovesbdb + 1216,
  magicmovesbdb + 1600, magicmovesbdb + 2240, magicmovesbdb + 4864,
  magicmovesbdb + 4960,
  magicmovesbdb + 5056, magicmovesbdb + 2720, magicmovesbdb + 864,
  magicmovesbdb + 1248,
  magicmovesbdb + 1632, magicmovesbdb + 2272, magicmovesbdb + 4896,
  magicmovesbdb + 5184
};
#else
#ifndef PERFECT_MAGIC_HASH
U64 magicmovesbdb[64][1 << 9];
#else
U64 magicmovesbdb[1428];
PERFECT_MAGIC_HASH magicmoves_b_indecies[64][1 << 9];
#endif
#endif

const U64 magicmoves_b_magics[64] = {
  0x0002020202020200ULL, 0x0002020202020000ULL, 0x0004010202000000ULL,
  0x0004040080000000ULL,
  0x0001104000000000ULL, 0x0000821040000000ULL, 0x0000410410400000ULL,
  0x0000104104104000ULL,
  0x0000040404040400ULL, 0x0000020202020200ULL, 0x0000040102020000ULL,
  0x0000040400800000ULL,
  0x0000011040000000ULL, 0x0000008210400000ULL, 0x0000004104104000ULL,
  0x0000002082082000ULL,
  0x0004000808080800ULL, 0x0002000404040400ULL, 0x0001000202020200ULL,
  0x0000800802004000ULL,
  0x0000800400A00000ULL, 0x0000200100884000ULL, 0x0000400082082000ULL,
  0x0000200041041000ULL,
  0x0002080010101000ULL, 0x0001040008080800ULL, 0x0000208004010400ULL,
  0x0000404004010200ULL,
  0x0000840000802000ULL, 0x0000404002011000ULL, 0x0000808001041000ULL,
  0x0000404000820800ULL,
  0x0001041000202000ULL, 0x0000820800101000ULL, 0x0000104400080800ULL,
  0x0000020080080080ULL,
  0x0000404040040100ULL, 0x0000808100020100ULL, 0x0001010100020800ULL,
  0x0000808080010400ULL,
  0x0000820820004000ULL, 0x0000410410002000ULL, 0x0000082088001000ULL,
  0x0000002011000800ULL,
  0x0000080100400400ULL, 0x0001010101000200ULL, 0x0002020202000400ULL,
  0x0001010101000200ULL,
  0x0000410410400000ULL, 0x0000208208200000ULL, 0x0000002084100000ULL,
  0x0000000020880000ULL,
  0x0000001002020000ULL, 0x0000040408020000ULL, 0x0004040404040000ULL,
  0x0002020202020000ULL,
  0x0000104104104000ULL, 0x0000002082082000ULL, 0x0000000020841000ULL,
  0x0000000000208800ULL,
  0x0000000010020200ULL, 0x0000000404080200ULL, 0x0000040404040400ULL,
  0x0002020202020200ULL
};
const U64 magicmoves_b_mask[64] = {
  0x0040201008040200ULL, 0x0000402010080400ULL, 0x0000004020100A00ULL,
  0x0000000040221400ULL,
  0x0000000002442800ULL, 0x0000000204085000ULL, 0x0000020408102000ULL,
  0x0002040810204000ULL,
  0x0020100804020000ULL, 0x0040201008040000ULL, 0x00004020100A0000ULL,
  0x0000004022140000ULL,
  0x0000000244280000ULL, 0x0000020408500000ULL, 0x0002040810200000ULL,
  0x0004081020400000ULL,
  0x0010080402000200ULL, 0x0020100804000400ULL, 0x004020100A000A00ULL,
  0x0000402214001400ULL,
  0x0000024428002800ULL, 0x0002040850005000ULL, 0x0004081020002000ULL,
  0x0008102040004000ULL,
  0x0008040200020400ULL, 0x0010080400040800ULL, 0x0020100A000A1000ULL,
  0x0040221400142200ULL,
  0x0002442800284400ULL, 0x0004085000500800ULL, 0x0008102000201000ULL,
  0x0010204000402000ULL,
  0x0004020002040800ULL, 0x0008040004081000ULL, 0x00100A000A102000ULL,
  0x0022140014224000ULL,
  0x0044280028440200ULL, 0x0008500050080400ULL, 0x0010200020100800ULL,
  0x0020400040201000ULL,
  0x0002000204081000ULL, 0x0004000408102000ULL, 0x000A000A10204000ULL,
  0x0014001422400000ULL,
  0x0028002844020000ULL, 0x0050005008040200ULL, 0x0020002010080400ULL,
  0x0040004020100800ULL,
  0x0000020408102000ULL, 0x0000040810204000ULL, 0x00000A1020400000ULL,
  0x0000142240000000ULL,
  0x0000284402000000ULL, 0x0000500804020000ULL, 0x0000201008040200ULL,
  0x0000402010080400ULL,
  0x0002040810204000ULL, 0x0004081020400000ULL, 0x000A102040000000ULL,
  0x0014224000000000ULL,
  0x0028440200000000ULL, 0x0050080402000000ULL, 0x0020100804020000ULL,
  0x0040201008040200ULL
};

#ifdef MINIMIZE_MAGIC
U64 magicmovesrdb[102400];
const unsigned int magicmoves_r_shift[64] = {
  52, 53, 53, 53, 53, 53, 53, 52,
  53, 54, 54, 54, 54, 54, 54, 53,
  53, 54, 54, 54, 54, 54, 54, 53,
  53, 54, 54, 54, 54, 54, 54, 53,
  53, 54, 54, 54, 54, 54, 54, 53,
  53, 54, 54, 54, 54, 54, 54, 53,
  53, 54, 54, 54, 54, 54, 54, 53,
  52, 53, 53, 53, 53, 53, 53, 52
};
const U64 *magicmoves_r_indecies[64] = {
  magicmovesrdb + 86016, magicmovesrdb + 73728, magicmovesrdb + 36864,
  magicmovesrdb + 43008,
  magicmovesrdb + 47104, magicmovesrdb + 51200, magicmovesrdb + 77824,
  magicmovesrdb + 94208,
  magicmovesrdb + 69632, magicmovesrdb + 32768, magicmovesrdb + 38912,
  magicmovesrdb + 10240,
  magicmovesrdb + 14336, magicmovesrdb + 53248, magicmovesrdb + 57344,
  magicmovesrdb + 81920,
  magicmovesrdb + 24576, magicmovesrdb + 33792, magicmovesrdb + 6144,
  magicmovesrdb + 11264,
  magicmovesrdb + 15360, magicmovesrdb + 18432, magicmovesrdb + 58368,
  magicmovesrdb + 61440,
  magicmovesrdb + 26624, magicmovesrdb + 4096, magicmovesrdb + 7168,
  magicmovesrdb + 0,
  magicmovesrdb + 2048, magicmovesrdb + 19456, magicmovesrdb + 22528,
  magicmovesrdb + 63488,
  magicmovesrdb + 28672, magicmovesrdb + 5120, magicmovesrdb + 8192,
  magicmovesrdb + 1024,
  magicmovesrdb + 3072, magicmovesrdb + 20480, magicmovesrdb + 23552,
  magicmovesrdb + 65536,
  magicmovesrdb + 30720, magicmovesrdb + 34816, magicmovesrdb + 9216,
  magicmovesrdb + 12288,
  magicmovesrdb + 16384, magicmovesrdb + 21504, magicmovesrdb + 59392,
  magicmovesrdb + 67584,
  magicmovesrdb + 71680, magicmovesrdb + 35840, magicmovesrdb + 39936,
  magicmovesrdb + 13312,
  magicmovesrdb + 17408, magicmovesrdb + 54272, magicmovesrdb + 60416,
  magicmovesrdb + 83968,
  magicmovesrdb + 90112, magicmovesrdb + 75776, magicmovesrdb + 40960,
  magicmovesrdb + 45056,
  magicmovesrdb + 49152, magicmovesrdb + 55296, magicmovesrdb + 79872,
  magicmovesrdb + 98304
};
#else
#ifndef PERFECT_MAGIC_HASH
U64 magicmovesrdb[64][1 << 12];
#else
U64 magicmovesrdb[4900];
PERFECT_MAGIC_HASH magicmoves_r_indecies[64][1 << 12];
#endif
#endif

//0xEA87FFFECBFEA5AE -- H8 11 bit key for Rooks

const U64 magicmoves_r_magics[64] = {
  0x0080001020400080ULL, 0x0040001000200040ULL, 0x0080081000200080ULL,
  0x0080040800100080ULL,
  0x0080020400080080ULL, 0x0080010200040080ULL, 0x0080008001000200ULL,
  0x0080002040800100ULL,
  0x0000800020400080ULL, 0x0000400020005000ULL, 0x0000801000200080ULL,
  0x0000800800100080ULL,
  0x0000800400080080ULL, 0x0000800200040080ULL, 0x0000800100020080ULL,
  0x0000800040800100ULL,
  0x0000208000400080ULL, 0x0000404000201000ULL, 0x0000808010002000ULL,
  0x0000808008001000ULL,
  0x0000808004000800ULL, 0x0000808002000400ULL, 0x0000010100020004ULL,
  0x0000020000408104ULL,
  0x0000208080004000ULL, 0x0000200040005000ULL, 0x0000100080200080ULL,
  0x0000080080100080ULL,
  0x0000040080080080ULL, 0x0000020080040080ULL, 0x0000010080800200ULL,
  0x0000800080004100ULL,
  0x0000204000800080ULL, 0x0000200040401000ULL, 0x0000100080802000ULL,
  0x0000080080801000ULL,
  0x0000040080800800ULL, 0x0000020080800400ULL, 0x0000020001010004ULL,
  0x0000800040800100ULL,
  0x0000204000808000ULL, 0x0000200040008080ULL, 0x0000100020008080ULL,
  0x0000080010008080ULL,
  0x0000040008008080ULL, 0x0000020004008080ULL, 0x0000010002008080ULL,
  0x0000004081020004ULL,
  0x0000204000800080ULL, 0x0000200040008080ULL, 0x0000100020008080ULL,
  0x0000080010008080ULL,
  0x0000040008008080ULL, 0x0000020004008080ULL, 0x0000800100020080ULL,
  0x0000800041000080ULL,
  0x0000102040800101ULL, 0x0000102040008101ULL, 0x0000081020004101ULL,
  0x0000040810002101ULL,
  0x0001000204080011ULL, 0x0001000204000801ULL, 0x0001000082000401ULL,
  0x0000002040810402ULL
};
const U64 magicmoves_r_mask[64] = {
  0x000101010101017EULL, 0x000202020202027CULL, 0x000404040404047AULL,
  0x0008080808080876ULL,
  0x001010101010106EULL, 0x002020202020205EULL, 0x004040404040403EULL,
  0x008080808080807EULL,
  0x0001010101017E00ULL, 0x0002020202027C00ULL, 0x0004040404047A00ULL,
  0x0008080808087600ULL,
  0x0010101010106E00ULL, 0x0020202020205E00ULL, 0x0040404040403E00ULL,
  0x0080808080807E00ULL,
  0x00010101017E0100ULL, 0x00020202027C0200ULL, 0x00040404047A0400ULL,
  0x0008080808760800ULL,
  0x00101010106E1000ULL, 0x00202020205E2000ULL, 0x00404040403E4000ULL,
  0x00808080807E8000ULL,
  0x000101017E010100ULL, 0x000202027C020200ULL, 0x000404047A040400ULL,
  0x0008080876080800ULL,
  0x001010106E101000ULL, 0x002020205E202000ULL, 0x004040403E404000ULL,
  0x008080807E808000ULL,
  0x0001017E01010100ULL, 0x0002027C02020200ULL, 0x0004047A04040400ULL,
  0x0008087608080800ULL,
  0x0010106E10101000ULL, 0x0020205E20202000ULL, 0x0040403E40404000ULL,
  0x0080807E80808000ULL,
  0x00017E0101010100ULL, 0x00027C0202020200ULL, 0x00047A0404040400ULL,
  0x0008760808080800ULL,
  0x00106E1010101000ULL, 0x00205E2020202000ULL, 0x00403E4040404000ULL,
  0x00807E8080808000ULL,
  0x007E010101010100ULL, 0x007C020202020200ULL, 0x007A040404040400ULL,
  0x0076080808080800ULL,
  0x006E101010101000ULL, 0x005E202020202000ULL, 0x003E404040404000ULL,
  0x007E808080808000ULL,
  0x7E01010101010100ULL, 0x7C02020202020200ULL, 0x7A04040404040400ULL,
  0x7608080808080800ULL,
  0x6E10101010101000ULL, 0x5E20202020202000ULL, 0x3E40404040404000ULL,
  0x7E80808080808000ULL
};


U64
initmagicmoves_occ (const int *squares, const int numSquares,
          const U64 linocc)
{
  int i;
  U64 ret = 0;
  for (i = 0; i < numSquares; i++)
    if (linocc & (((U64) (1)) << i))
      ret |= (((U64) (1)) << squares[i]);
  return ret;
}

U64
initmagicmoves_Rmoves (const int square, const U64 occ)
{
  U64 ret = 0;
  U64 bit;
  U64 rowbits = (((U64) 0xFF) << (8 * (square / 8)));

  bit = (((U64) (1)) << square);
  do
    {
      bit <<= 8;
      ret |= bit;
    }
  while (bit && !(bit & occ));
  bit = (((U64) (1)) << square);
  do
    {
      bit >>= 8;
      ret |= bit;
    }
  while (bit && !(bit & occ));
  bit = (((U64) (1)) << square);
  do
    {
      bit <<= 1;
      if (bit & rowbits)
   ret |= bit;
      else
   break;
    }
  while (!(bit & occ));
  bit = (((U64) (1)) << square);
  do
    {
      bit >>= 1;
      if (bit & rowbits)
   ret |= bit;
      else
   break;
    }
  while (!(bit & occ));
  return ret;
}

U64
initmagicmoves_Bmoves (const int square, const U64 occ)
{
  U64 ret = 0;
  U64 bit;
  U64 bit2;
  U64 rowbits = (((U64) 0xFF) << (8 * (square / 8)));

  bit = (((U64) (1)) << square);
  bit2 = bit;
  do
    {
      bit <<= 8 - 1;
      bit2 >>= 1;
      if (bit2 & rowbits)
   ret |= bit;
      else
   break;
    }
  while (bit && !(bit & occ));
  bit = (((U64) (1)) << square);
  bit2 = bit;
  do
    {
      bit <<= 8 + 1;
      bit2 <<= 1;
      if (bit2 & rowbits)
   ret |= bit;
      else
   break;
    }
  while (bit && !(bit & occ));
  bit = (((U64) (1)) << square);
  bit2 = bit;
  do
    {
      bit >>= 8 - 1;
      bit2 <<= 1;
      if (bit2 & rowbits)
   ret |= bit;
      else
   break;
    }
  while (bit && !(bit & occ));
  bit = (((U64) (1)) << square);
  bit2 = bit;
  do
    {
      bit >>= 8 + 1;
      bit2 >>= 1;
      if (bit2 & rowbits)
   ret |= bit;
      else
   break;
    }
  while (bit && !(bit & occ));
  return ret;
}

//used so that the original indecies can be left as const so that the compiler can optimize better

#ifndef PERFECT_MAGIC_HASH
#ifdef MINIMIZE_MAGIC
#define BmagicNOMASK2(square, occupancy) *(magicmoves_b_indecies2[square]+(((occupancy)*magicmoves_b_magics[square])>>magicmoves_b_shift[square]))
#define RmagicNOMASK2(square, occupancy) *(magicmoves_r_indecies2[square]+(((occupancy)*magicmoves_r_magics[square])>>magicmoves_r_shift[square]))
#else
#define BmagicNOMASK2(square, occupancy) magicmovesbdb[square][((occupancy)*magicmoves_b_magics[square])>>MINIMAL_B_BITS_SHIFT]
#define RmagicNOMASK2(square, occupancy) magicmovesrdb[square][((occupancy)*magicmoves_r_magics[square])>>MINIMAL_R_BITS_SHIFT]
#endif
/*#else
   #define BmagicNOMASK2(square, occupancy) magicmovesbdb[magicmoves_b_indecies[square][((occupancy)*magicmoves_b_magics[square])>>MINIMAL_B_BITS_SHIFT]]
   #define RmagicNOMASK2(square, occupancy) magicmovesrdb[magicmoves_r_indecies[square][((occupancy)*magicmoves_r_magics[square])>>MINIMAL_R_BITS_SHIFT]]
*/
#endif

void
initmagicmoves (void)
{
  int i;

  //for bitscans :
  //initmagicmoves_bitpos64_database[(x*0x07EDD5E59A4E28C2)>>58]
  int initmagicmoves_bitpos64_database[64] = {
    63, 0, 58, 1, 59, 47, 53, 2,
    60, 39, 48, 27, 54, 33, 42, 3,
    61, 51, 37, 40, 49, 18, 28, 20,
    55, 30, 34, 11, 43, 14, 22, 4,
    62, 57, 46, 52, 38, 26, 32, 41,
    50, 36, 17, 19, 29, 10, 13, 21,
    56, 45, 25, 31, 35, 16, 9, 12,
    44, 24, 15, 8, 23, 7, 6, 5
  };

#ifdef MINIMIZE_MAGIC
  //identical to magicmove_x_indecies except without the const modifer
  U64 *magicmoves_b_indecies2[64] = {
    magicmovesbdb + 4992, magicmovesbdb + 2624, magicmovesbdb + 256,
    magicmovesbdb + 896,
    magicmovesbdb + 1280, magicmovesbdb + 1664, magicmovesbdb + 4800,
    magicmovesbdb + 5120,
    magicmovesbdb + 2560, magicmovesbdb + 2656, magicmovesbdb + 288,
    magicmovesbdb + 928,
    magicmovesbdb + 1312, magicmovesbdb + 1696, magicmovesbdb + 4832,
    magicmovesbdb + 4928,
    magicmovesbdb + 0, magicmovesbdb + 128, magicmovesbdb + 320,
    magicmovesbdb + 960,
    magicmovesbdb + 1344, magicmovesbdb + 1728, magicmovesbdb + 2304,
    magicmovesbdb + 2432,
    magicmovesbdb + 32, magicmovesbdb + 160, magicmovesbdb + 448,
    magicmovesbdb + 2752,
    magicmovesbdb + 3776, magicmovesbdb + 1856, magicmovesbdb + 2336,
    magicmovesbdb + 2464,
    magicmovesbdb + 64, magicmovesbdb + 192, magicmovesbdb + 576,
    magicmovesbdb + 3264,
    magicmovesbdb + 4288, magicmovesbdb + 1984, magicmovesbdb + 2368,
    magicmovesbdb + 2496,
    magicmovesbdb + 96, magicmovesbdb + 224, magicmovesbdb + 704,
    magicmovesbdb + 1088,
    magicmovesbdb + 1472, magicmovesbdb + 2112, magicmovesbdb + 2400,
    magicmovesbdb + 2528,
    magicmovesbdb + 2592, magicmovesbdb + 2688, magicmovesbdb + 832,
    magicmovesbdb + 1216,
    magicmovesbdb + 1600, magicmovesbdb + 2240, magicmovesbdb + 4864,
    magicmovesbdb + 4960,
    magicmovesbdb + 5056, magicmovesbdb + 2720, magicmovesbdb + 864,
    magicmovesbdb + 1248,
    magicmovesbdb + 1632, magicmovesbdb + 2272, magicmovesbdb + 4896,
    magicmovesbdb + 5184
  };
  U64 *magicmoves_r_indecies2[64] = {
    magicmovesrdb + 86016, magicmovesrdb + 73728, magicmovesrdb + 36864,
    magicmovesrdb + 43008,
    magicmovesrdb + 47104, magicmovesrdb + 51200, magicmovesrdb + 77824,
    magicmovesrdb + 94208,
    magicmovesrdb + 69632, magicmovesrdb + 32768, magicmovesrdb + 38912,
    magicmovesrdb + 10240,
    magicmovesrdb + 14336, magicmovesrdb + 53248, magicmovesrdb + 57344,
    magicmovesrdb + 81920,
    magicmovesrdb + 24576, magicmovesrdb + 33792, magicmovesrdb + 6144,
    magicmovesrdb + 11264,
    magicmovesrdb + 15360, magicmovesrdb + 18432, magicmovesrdb + 58368,
    magicmovesrdb + 61440,
    magicmovesrdb + 26624, magicmovesrdb + 4096, magicmovesrdb + 7168,
    magicmovesrdb + 0,
    magicmovesrdb + 2048, magicmovesrdb + 19456, magicmovesrdb + 22528,
    magicmovesrdb + 63488,
    magicmovesrdb + 28672, magicmovesrdb + 5120, magicmovesrdb + 8192,
    magicmovesrdb + 1024,
    magicmovesrdb + 3072, magicmovesrdb + 20480, magicmovesrdb + 23552,
    magicmovesrdb + 65536,
    magicmovesrdb + 30720, magicmovesrdb + 34816, magicmovesrdb + 9216,
    magicmovesrdb + 12288,
    magicmovesrdb + 16384, magicmovesrdb + 21504, magicmovesrdb + 59392,
    magicmovesrdb + 67584,
    magicmovesrdb + 71680, magicmovesrdb + 35840, magicmovesrdb + 39936,
    magicmovesrdb + 13312,
    magicmovesrdb + 17408, magicmovesrdb + 54272, magicmovesrdb + 60416,
    magicmovesrdb + 83968,
    magicmovesrdb + 90112, magicmovesrdb + 75776, magicmovesrdb + 40960,
    magicmovesrdb + 45056,
    magicmovesrdb + 49152, magicmovesrdb + 55296, magicmovesrdb + 79872,
    magicmovesrdb + 98304
  };
#endif // MINIMIZE_MAGIC


#ifdef PERFECT_MAGIC_HASH
  for (i = 0; i < 1428; i++)
    magicmovesbdb[i] = 0;
  for (i = 0; i < 4900; i++)
    magicmovesrdb[i] = 0;
#endif

  for (i = 0; i < 64; i++)
    {
      int squares[64];
      int numsquares = 0;
      U64 temp = magicmoves_b_mask[i];
      while (temp)
   {
     U64 bit = temp & -temp;
     squares[numsquares++] =
       initmagicmoves_bitpos64_database[(bit *
                     0x07EDD5E59A4E28C2ULL) >> 58];
     temp ^= bit;
   }
      for (temp = 0; temp < (((U64) (1)) << numsquares); temp++)
   {
     U64 tempocc = initmagicmoves_occ (squares, numsquares, temp);
#ifndef PERFECT_MAGIC_HASH
     BmagicNOMASK2 (i, tempocc) = initmagicmoves_Bmoves (i, tempocc);
#else
     U64 moves = initmagicmoves_Bmoves (i, tempocc);
     U64 index =
       (((tempocc) * magicmoves_b_magics[i]) >> MINIMAL_B_BITS_SHIFT);
     int j;
     for (j = 0; j < 1428; j++)
       {
         if (!magicmovesbdb[j])
      {
        magicmovesbdb[j] = moves;
        magicmoves_b_indecies[i][index] = j;
        break;
      }
         else if (magicmovesbdb[j] == moves)
      {
        magicmoves_b_indecies[i][index] = j;
        break;
      }
       }
#endif
   }
    }
  for (i = 0; i < 64; i++)
    {
      int squares[64];
      int numsquares = 0;
      U64 temp = magicmoves_r_mask[i];
      while (temp)
   {
     U64 bit = temp & -temp;
     squares[numsquares++] =
       initmagicmoves_bitpos64_database[(bit *
                     0x07EDD5E59A4E28C2ULL) >> 58];
     temp ^= bit;
   }
      for (temp = 0; temp < (((U64) (1)) << numsquares); temp++)
   {
     U64 tempocc = initmagicmoves_occ (squares, numsquares, temp);
#ifndef PERFECT_MAGIC_HASH
     RmagicNOMASK2 (i, tempocc) = initmagicmoves_Rmoves (i, tempocc);
#else
     U64 moves = initmagicmoves_Rmoves (i, tempocc);
     U64 index =
       (((tempocc) * magicmoves_r_magics[i]) >> MINIMAL_R_BITS_SHIFT);
     int j;
     for (j = 0; j < 4900; j++)
       {
         if (!magicmovesrdb[j])
      {
        magicmovesrdb[j] = moves;
        magicmoves_r_indecies[i][index] = j;
        break;
      }
         else if (magicmovesrdb[j] == moves)
      {
        magicmoves_r_indecies[i][index] = j;
        break;
      }
       }
#endif
   }
    }
}


[/code]
smcracraft
 
Posts: 65
Joined: 15 Jan 2006, 05:38

Re: A Faster Magic Move Bitboard Generator?

Postby Grant Osborne » 15 Dec 2006, 17:11

Hello Tord

I would guess that you are very pleased with a 20% speedup on your AMD64 X2.

I have a water-cooled 32-bit 3.6GHz P4 processor and I get about a 10% speedup. I too am surprised that you did not see a speedup on your 32-bit machine.

Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: A Faster Magic Move Bitboard Generator?

Postby Tord Romstad » 15 Dec 2006, 17:39

Grant Osborne wrote:I would guess that you are very pleased with a 20% speedup on your AMD64 X2.

I guess would have been, if the AMD had been mine. :)

The AMD is the computer here in my office, the Core Duo is the iMac at my home.

I have a water-cooled 32-bit 3.6GHz P4 processor and I get about a 10% speedup. I too am surprised that you did not see a speedup on your 32-bit machine.

Yes, it makes no sense, and I was rather disappointed.

I hope to have a version ready to be sent to testers some time before Christmas. It'll be interesting to see how it performs in 64-bit mode.

Tord
User avatar
Tord Romstad
 
Posts: 639
Joined: 09 Oct 2004, 12:49
Location: Oslo, Norway

Re: A Faster Magic Move Bitboard Generator?

Postby Grant Osborne » 15 Dec 2006, 21:33

Hello Stuart

I changed the names of the arrays to make it easier to read in my program. Sorry to confuse you.

Here are the name changes:-

const U64 magicmoves_b_magics[64] is what I have called const U64 bishopMagic[64]
const U64 magicmoves_r_magics[64] is what I have called const U64 rookMagic[64]
const U64 *magicmoves_b_indecies[64] is what I have called U64* bIndecies[64]
const U64 *magicmoves_r_indecies[64] is what I have called U64* rIndecies[64]
const unsigned int magicmoves_b_shift[64] is what I have called const unsigned int bishopShift[64]
const unsigned int magicmoves_r_shift[64] is what I have called const unsigned int rookShift[64]
U64 magicmovesbdb[5248] is what I have called U64 Bdb[4911]
U64 magicmovesrdb[102400] is what I have called U64 Rdb[101964]

hope this helps.

Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: A Faster Magic Move Bitboard Generator?

Postby Michael Sherwin » 17 Dec 2006, 01:39

Grant Osborne wrote:Hi all

For those of you that are using Pradu's magic move bitboard generator (or understand how it works) and have a 32-bit computer, you'll be using code that looks something like this to retrieve your attackboards:-

Code: Select all
U64 Bmagic(unsigned int square) {
      return *(bIndecies[square] + (((Occupied & bishopMask[square]) * bishopMagic[square]) >> bishopShift[square]));
   }

   U64 Rmagic(unsigned int square) {   
      return *(rIndecies[square] + (((Occupied & rookMask[square]) * rookMagic[square]) >> rookShift[square]));
   }


I picked up on a recent post by H.G Muller and substituted the code above with:-

Code: Select all
U64 Bmagic(unsigned int square) {
      unsigned int lo = bishopMagicLo[square] * (int) Occupied;
      unsigned int hi = bishopMagicHi[square] * (int) (Occupied>>32);
      return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square]));
   }

   U64 Rmagic(unsigned int square) {
      unsigned int lo = rookMagicLo[square] * (int) Occupied;
      unsigned int hi = rookMagicHi[square] * (int) (Occupied>>32);
      return *(rIndecies[square] + ((lo ^ hi) >> rookShift[square]));
   }


This modification requires a new set of two 32-bit keys per square (or like me you could join them into one 64-bit key) and only uses two 32-bit multiplies instead of three, and a 32-bit shift, and is probably more suited to a 32-bit machine.
I have implemented this into my program and, for me, I got an increased perft speed of 1.5 million nps.

My thanks to H.G Muller

Grant


Hi Grant,

Nice demonstration and very good news for 32 bit hardware owners, especially for AMD.
However, and this is not a negative reply in any way, I just want to bring to light
that there was more happening in the other thread than just wether two 32 bit
multiplications was faster than three. There was also two points of contention as well.

The first point of contention was if shifting to create an index could be faster than
multiplication. All optimization text that I have read say that it is. It is better to
do multiple shifts and adds rather than one multiplication.

There is a problem with the the shifting method though, and that is for the bishops
(not the rooks) the index is not compact and uses more memory than the Lasse/Pradu
magic bitboards. This is solved by the second point of contension.

The second point of contention is if it is worth splitting the index so the total table
sizes will only be a very small fraction of what is needed by the Lasse/Pradu method and
then combining the results of two lookups.

My contention is that shifting is faster than multipication and that the smaller tables
will be very easy on the cash. Both methods combined IMHO will gain in speed over
Lasse/Pradu much more than the 10 to 20 percent that your modification has. Even for 64
bit I expect a substantial speedup.

Okay, "prove it" some may say! Well I am working to do just that. Just do not expect the
results anytime soon. I am single and have to do all the house work and yard work myself,
as well as earn a living and take care of aged parents and do all their yard work and some
of their house work. And I am not a very fast programmer to begin with. So, if anyone is
interested in having these answers before six months from now then I hope that someone
else could make some of these test.

Mike

Edit: I can present some subjective evidence.

Harald Luben made a test comparison between several bitboard methods at talkchess.com. Here are the results.
Code: Select all
+----------------------------+-----------------------+--------------+-----------+
| Name                       | Inventor or Supporter | Table Size   | Test-Time |
+----------------------------+-----------------------+--------------+-----------+
| Rotated Bitboards (naive)  | Robert Hyatt          |   77.2 KByte |  87.78 s  |
| Rotated Bitboards (aligned)| ?                     |   35.0 KByte |  86.65 s  |
| Rotated Bitboards (switch) | (Harald Lüßen?)       |   14.9 KByte |  88.91 s  |
| Rotated Indices            | Alessandro Damiani    |  128.3 KByte |  78.93 s  |
| Exploding Bitboards        | Harald Lüßen          |    3.5 KByte | 101.79 s  |
| Magic Multiplication       | Gerd Isenberg         |    9.7 KByte |  91.87 s  |
| Magic Multiplication 32 bit| Gerd Isenberg         |    9.7 KByte |  81.37 s  |
| Sherwin Index              | Michael Sherwin       | 1402.4 KByte |  90.03 s  |
| Pradu Minimize Magic       | Pradyumna Kannan      |  844.0 KByte |  81.42 s  |
| Pradu Perfect Magic Hash   |  and                  |  627.4 KByte |  82.09 s  |
| Pradu Big                  | Lasse Hansen          | 2306.0 KByte |  82.33 s  |
| Kogge-Stone                | Steffan Westcott      |    0.0 KByte |  99.32 s  |
| Simple Shift               | --                    |    0.0 KByte | 110.25 s  |
| Naive Shift                | --                    |    0.0 KByte | 111.69 s  |
+----------------------------+-----------------------+--------------+-----------+


The 32 bit unfriendly Sherwin Index tested faster than the the 32 bit unfriendly Magic Multiplication of Gerd Isenberg. Gerd's Magic Multiplication 32 bit (friendly) version did as well or better than Lasse/Pradu. I claim that the Sherwin Index was even more 32 bit unfriendly than Gerd's 32 bit unfriendly version, which would make the 32 bit friendly version of the Sherwin index (had it been tested)the fastest of the three on 32 bit machines.

The Sherwin Index above was done using the Sherwin Index 1 in wich 6 lookups were used to form the index for the bishop and 8 lookups for the rook (sounds strange, dosn't it). Sherwin Index 2 shifts the bits down to two lookups for the bishop and two for the rook.

So, I just think that the Sherwin Index 1 & 2 deserve a closer look before passing jugement. BTW I did not name it the "Sherwin Index'--Harald did.
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: A Faster Magic Move Bitboard Generator?

Postby Pradu » 17 Dec 2006, 08:38

Grant Osborne wrote:This modification requires a new set of two 32-bit keys per square (or like me you could join them into one 64-bit key) and only uses two 32-bit multiplies instead of three, and a 32-bit shift, and is probably more suited to a 32-bit machine.
I have implemented this into my program and, for me, I got an increased perft speed of 1.5 million nps.
My thanks to H.G Muller
Grant


1.5 mil nps gain in 32-bits is quite good. I don't know what we would do without Gerd giving us that hint about a 64-bit multiply being 3 32-bit multiplys 8-). We could proabably get rid of the extra >>& but the compiler might be smart enough to precompute the >>& and split the 64-element 64-bit magic array into a 128 element 32-bit array. I guess you or Gerd could check the assembler to make sure.

May I have permission to add your magics to magicmoves if at a later time I (gasp) add some more #ifdefs for a 32-bit optimized magicmoves? I will give due credit where required.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: A Faster Magic Move Bitboard Generator?

Postby Tord Romstad » 17 Dec 2006, 10:34

Pradu wrote:May I have permission to add your magics to magicmoves if at a later time I (gasp) add some more #ifdefs for a 32-bit optimized magicmoves?

Hello Pradu,

Generating the keys for 32-bit multiplications is even easier than for 64-bit multiplications. I modified my simple magic number generator to be able to generate both (just change the #define at the top). With 64-bit multiplications, generating all rook (10, 11, 12-bit) and bishop (5, 6, 7, 9-bit) numbers for the whole board takes about 10 seconds on my computer. With 32-bit multiplications, it takes only 2 seconds. The code is totally unoptimised, and it wouldn't suprise me at all if it is possible to improve the speed by a factor of 10. With just a little effort, it should be possible to calculate the magic numbers during program initialisation instead of including them as a constant array in the source code.

If somebody would like to use my code to hunt for better magics (like 11-bit rook magics for the corners, or 10-bit rook magics for the edges), you are of course more than welcome to do so, but I think the chances of success would be better by using a more intelligent approach than mine (my program just tries random numbers until it finds one which works).

BTW: Would anyone be interested in setting up a distributed computing project (like seti@home) for finding better magic numbers? I lack the technical skills to set up such a project, but I would be happy to help by improving the algorithms for finding magic numbers and providing CPU time.

Tord
User avatar
Tord Romstad
 
Posts: 639
Joined: 09 Oct 2004, 12:49
Location: Oslo, Norway

Re: A Faster Magic Move Bitboard Generator?

Postby Grant Osborne » 17 Dec 2006, 11:15

Tord

Better magics most certainly do exist for this 32-bit generator. I am even at this moment systematically scanning each square and my aim is a table size of 800Kb (I am down to 825Kb so far).

My magic number generator is like yours in that it just tries random numbers, the trick is to know which range of numbers works best for the particular square. 11-bit rook magics for the corners are I think sadly out of the question - I can't get anywhere near them this time.

Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: A Faster Magic Move Bitboard Generator?

Postby Grant Osborne » 17 Dec 2006, 11:36

Pradu

Of course you have my permission to add the magics I've found to your magicmoves. You may want to wait a few more days until I have finished looking for better rook magics for the smallest table. I will publish them shortly.

Code: Select all
U64 Bmagic(unsigned int square) {
      unsigned int lo = bishopMagicLo[square] * (int) Occupied;
      unsigned int hi = bishopMagicHi[square] * (int) (Occupied>>32);
      return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square]));
   }


I wrote this in asm in just 15 instructions and used 2 imul's and 1 shr

Code: Select all
      mov   edx, dword ptr [occupied]
   mov   ecx, dword ptr [occupied+4]
   and   edx, dword ptr bishopMask [esi*8]
   and   ecx, dword ptr bishopMask+4 [esi*8]
   imul   edx, dword ptr bishopMagic [esi*8]
   mov   eax, dword ptr bIndecies [esi*4]
   imul   ecx, dword ptr bishopMagic+4 [esi*8]
   mov   edi, dword ptr [temp]
   xor   edx, ecx
   mov   ecx, dword ptr bishopShift [esi*4]
   shr   edx, cl
   mov   ebx, dword ptr [temp+4]
   lea   eax, [eax+edx*8]
   and   edi, dword ptr [eax]
   and   ebx, dword ptr [eax+4]


Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: A Faster Magic Move Bitboard Generator?

Postby mjlef » 17 Dec 2006, 12:27

Tord,

I have a couple of unused machines (Pentium M 1.6 and 1.7 GHz). If you can compile a couple of different magic searchers, I would be glad to run them here and report the results to you.

Mark
mjlef
 
Posts: 64
Joined: 29 Mar 2006, 22:04

Re: A Faster Magic Move Bitboard Generator?

Postby Gerd Isenberg » 17 Dec 2006, 12:45

Grant Osborne wrote:
Code: Select all
U64 Bmagic(unsigned int square) {
      unsigned int lo = bishopMagicLo[square] * (int) Occupied;
      unsigned int hi = bishopMagicHi[square] * (int) (Occupied>>32);
      return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square]));
   }


I wrote this in asm in just 15 instructions and used 2 imul's and 1 shr


Indeed the fastest 32-bit code one can imagine!
Moderate register usage plus the gain to do two almost independent chains until xor in parallel. 32-bit kindergarten for instance suffers from too less registers to really gain from parallel speedup.

The same is true btw., if i compare the bswap approach to Pradu's approach in 64-bit mode. The bswap comes close on K8 due to much smaller tables (2*512Byte) and maximum ipc by using eight architectural registers (which implies the save/restore of at least one callee safe register with vc2005) and up to four independent chains. But when it is about further parallel speedup eg. in queen-attacks or determining pinners or other xray stuff, bswap sucks because alu- and register ressources are already saturated.

Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: A Faster Magic Move Bitboard Generator?

Postby Grant Osborne » 17 Dec 2006, 13:29

Hello Mike

Michael Sherwin wrote:
Nice demonstration and very good news for 32 bit hardware owners...


I think a 32-bit version of Pradu's/Lasse minimize magic was inevitable, it just happened to be me that published it. It sounds like the Sherwin Index has potential to become faster or even the fastest and I hope to be the first to congratulate you if it is. Sometimes, to get to the ultimate speed, you have to creep up on it.

Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: A Faster Magic Move Bitboard Generator?

Postby Gerd Isenberg » 17 Dec 2006, 16:01

Grant Osborne wrote:
Code: Select all
U64 Bmagic(unsigned int square) {
      unsigned int lo = bishopMagicLo[square] * (int) Occupied;
      unsigned int hi = bishopMagicHi[square] * (int) (Occupied>>32);
      return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square]));
   }


I wrote this in asm in just 15 instructions and used 2 imul's and 1 shr


Grant, have you tried to pack all elements addressed by square inside one struct? This might have the advantage of shorter opcode and fetching all four elements from one cacheline per square instead of four at the cost of one additional shift of square, since sizeof struct becomes > 8.

Code: Select all
struct SMagic {
  unsigned int bishopMagicLo;
  unsigned int bishopMagicHi;
  unsigned int bishopShift;
  unsigned int bIndecies;
} sm[64];

U64 Bmagic(unsigned int square) {
      unsigned int lo = sm[square].bishopMagicLo * (int) Occupied;
      unsigned int hi = sm[square].bishopMagicHi * (int) (Occupied>>32);
      return *(sm[square].bIndecies + ((lo ^ hi) >> sm[square].bishopShift));
}

Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: A Faster Magic Move Bitboard Generator?

Postby Michael Sherwin » 17 Dec 2006, 17:12

Grant Osborne wrote:Hello Mike

Michael Sherwin wrote:
Nice demonstration and very good news for 32 bit hardware owners...


I think a 32-bit version of Pradu's/Lasse minimize magic was inevitable, it just happened to be me that published it. It sounds like the Sherwin Index has potential to become faster or even the fastest and I hope to be the first to congratulate you if it is. Sometimes, to get to the ultimate speed, you have to creep up on it.

Grant


Hi Grant,

Thanks for the encouragement! :D

I have an idea for a very small program to only test bishop and rook getters that I may be able to write in a short amount of time. The constant set part of the program will be in a file called testGetters.c and the code to be tested will be in getters.c. The only permanent part of getters.c will be four stubs, two for the rook and two for the bishop. The program will call the two 'empty' stubs to find the time it takes to run the test with out any getters operating. The other two stubs will be changed to call the bishop and rook getters, so they may be tested. In testGetters.c a simulated hash table of various sizes will have simulated stores and lookups that will be included in the test to determine what effect combining hash table memory usage and the memory needed for the getters has on the speed of the system. The main part of the test will just be calling the getters with all possible blocking patterns for every square for both the bishop and the rook a number of times and measureing the time it takes minus the time for just calling the empty stubs.

Any thoughts or suggestions?

Mike
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: A Faster Magic Move Bitboard Generator?

Postby Michael Sherwin » 17 Dec 2006, 17:19

Gerd Isenberg wrote:
Grant Osborne wrote:
Code: Select all
U64 Bmagic(unsigned int square) {
      unsigned int lo = bishopMagicLo[square] * (int) Occupied;
      unsigned int hi = bishopMagicHi[square] * (int) (Occupied>>32);
      return *(bIndecies[square] + ((lo ^ hi) >> bishopShift[square]));
   }


I wrote this in asm in just 15 instructions and used 2 imul's and 1 shr


Grant, have you tried to pack all elements addressed by square inside one struct? This might have the advantage of shorter opcode and fetching all four elements from one cacheline per square instead of four at the cost of one additional shift of square, since sizeof struct becomes > 8.

Code: Select all
struct SMagic {
  unsigned int bishopMagicLo;
  unsigned int bishopMagicHi;
  unsigned int bishopShift;
  unsigned int bIndecies;
} sm[64];

U64 Bmagic(unsigned int square) {
      unsigned int lo = sm[square].bishopMagicLo * (int) Occupied;
      unsigned int hi = sm[square].bishopMagicHi * (int) (Occupied>>32);
      return *(sm[square].bIndecies + ((lo ^ hi) >> sm[square].bishopShift));
}

Gerd


Hi Gerd,

I do not know if you saw my last posted code for the Sherwin Index 2 thread or not. I also went to a square structure that has all the information available with the use of a single square pointer. I was wondering if it was the right thing to do. According to your post for Grant it seems that I was right. Thanks for the conformation--even if it was indirect.

Mike
User avatar
Michael Sherwin
 
Posts: 168
Joined: 21 Jul 2005, 06:10
Location: USA

Re: A Faster Magic Move Bitboard Generator?

Postby Grant Osborne » 17 Dec 2006, 18:08

Hello Gerd

Thanks for the suggestion. I will try it.

Something like this maybe?
Code: Select all
mov   ebx, esi
mov   edx, dword ptr [occupied]
mov   ecx, dword ptr [occupied+4]
shr   ebx, 4
and   edx, dword ptr bishopMask [esi*8]
mov   eax, dword ptr sm [ebx]
and   ecx, dword ptr bishopMask+4 [esi*8]
imul   edx, dword ptr sm [eax]
mov   edi, dword ptr [temp]
imul   ecx, dword ptr sm [eax+4]
mov   ebx, dword ptr [temp+4]
xor   edx, ecx
mov   ecx, dword ptr [eax+8]
mov   eax, dword ptr [eax+0x0C]
shr   edx, cl
lea   eax, [eax+edx*8]
and   edi, dword ptr [eax]
and   ebx, dword ptr [eax+4]


It would be, as you say, at the cost of one extra shift.

Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Next

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 8 guests