Winboard Forum

by **Tord Romstad** » 06 Nov 2004, 13:40

Hi all,

I'm currently experimenting with bitboards in Glaurung 0.2. My initial results are not very promising (a slowdown of about 30% compared to Glaurung 0.1.4), but I will keep trying a bit longer before I give up. At the moment, I have two questions:

What is the best way to compute a bitboard containing 1s at all squares between two given squares, and 0s everywhere else? For instance, given the squares a2 and f7, I want a bitboard with 1s at b3, c4, d5 and e6. I currently do this with a huge lookup table indexed by all pairs of squares, which is of course a very ugly and unsatisfactory solution.

It seems that one of the main reasons for the slowdown in the bitboard version of Glaurung is that looping through the nonzero bits of a bitboard is very slow compared to looping through a piece list. When I introduced bitboards, I ripped out the piece lists in order to simplify and speed up the make_move and unmake_move functions. I am now wondering whether this was really a good idea. Do bitboard engines usually have some kind of piece lists in addition to bitboards?

Tord

by **Sune Fischer** » 06 Nov 2004, 14:27

Hi Tord,

I think you have the fastest way for rays. It's not easy to beat a table lookup using on-the-fly computations.

I don't see what's so ugly about using precomputed lookup tables though

I thought about adding a piece list, like you say it's faster to "get them out", but I decided against it for several reasons. For one thing it's more code, more maintenance and therefore more errorprone and "ugly", IMO.

Secondly, how do you work on a piece list? I want to know the number of rooks on openfiles, the bishops in the long diagonal the pawns in front of the king etc..
Almost everytime you need something you need it in reverse!

To find this information without bitboards you have to scan through the list again and again which is very much uglier and slower than extracting a bit from a bitboard, IMO. In some cases you don't have to extract any bits because a simple & will tell you there are no pieces there, and in other cases just knowing if there is a piece or not will be sufficient.

-S.

by **Tord Romstad** » 06 Nov 2004, 16:31

Hi Sune,

Thanks for the comments.

Sune Fischer wrote:I think you have the fastest way for rays. It's not easy to beat a table lookup using on-the-fly computations.

I don't see what's so ugly about using precomputed lookup tables though

Simple: In this case, the table is far too big for my taste. This table alone would consume 32KB. As a comparison, the total size of *all* tables and global data except hash tables in Glaurung 0.1.4 is about 58 KB. This number could easily be reduced further by changing a few 32-bit integer arrays to 16-bit or 8-bit arrays.

I have nothing against big tables in general, but I prefer to use them only when the information contained in them seems relatively complex, and can be used to answer a wide variety of different questions. Using 32 KB of memory just to answer questions of the type "is square x between square y and square z?" seems very uneconomical to me. Whenever I find myself using a big lookup table for such simple tasks, I instinctively ask myself whether this is really necessary. In most cases, it turns out to be possible to do the same calculations just as fast with only a tiny fraction of the memory usage.

The biggest tables in Glaurung (again, except hash tables, which can theoretically be reduced to 0) are the history table (16 KB), the move stack (16 KB), the search stack (8 KB) and the Zobrist keys (7 KB). Everything else is 2 KB or less (unless I have missed something).

I thought about adding a piece list, like you say it's faster to "get them out", but I decided against it for several reasons. For one thing it's more code, more maintenance and therefore more errorprone and "ugly", IMO.

That's what I thought, too. My piece lists by themselves are not very error-prone or difficult to maintain. In fact, updating the piece list when moves are made and unmade is less work (in terms of lines of code as well as execution time) than updating bitboards. But having bitboards *and* piece lists seems very redundant. Essentially you end up maintaining exactly the same information in two different data structures, which doesn't look like the right thing to do.

Secondly, how do you work on a piece list? I want to know the number of rooks on openfiles, the bishops in the long diagonal the pawns in front of the king etc..
Almost everytime you need something you need it in reverse!

I'm not sure I understand what you mean here. I always loop through all pieces on the board at least once in the evaluation function, and I don't see how I can avoid this even when using bitboards. During this loop, the kind of information you describe is easily obtained.

To find this information without bitboards you have to scan through the list again and again which is very much uglier and slower than extracting a bit from a bitboard, IMO.

I still don't understand. Of course it still happens that I have to loop through parts of the piece list again after the main loop through all pieces, but I hardly ever need to loop through more than a tiny section of the piece list. If I want to know whether a passed pawn is supported by a rook, I only need to loop through the rook section of the piece list (at most two entries, except in very rare situations). When I want to know whether h7 is attacked by a white bishop, I only need to loop through the white-squared white bishops. The loops are always very short.

In some cases you don't have to extract any bits because a simple & will tell you there are no pieces there, and in other cases just knowing if there is a piece or not will be sufficient.

This is, as far as I can see, the main advantage with bitboards compared to piece lists in this respect. I hoped that this would be sufficient to compensate for the expensive operation of looping through nonzero bits. My experience so far is that it isn't even close to sufficient. Of course, the main reason for this is probably that I still have a lot to learn about bitboards, and that my current implementation is nowhere near optimal. I'll keep trying.

Tord

by **Sune Fischer** » 06 Nov 2004, 18:34

Hi Tord,

Look at Crafty, it's got 2 MB tables and is still crunching happily along at 1 MNps, it doesn't even seem to profit from larger caches!
So, you want to see ragingly high nps you have to drop all the board scans and start using some tables!
There, I said it.

Seriously, in 10 years we will all be using 20 MB caches, it's practicly a non-issue in the long run.

Okay, alternatives.
Well what about a switch in the x-y direction to generate that attack line from y. Then see if it hits x and z. Or start by checking that x,y and y,z are on a line. You can convert to 0x88 first so you only need that tiny 256 table

In fact, updating the piece list when moves are made and unmade is less work (in terms of lines of code as well as execution time) than updating bitboards. But having bitboards *and* piece lists seems very redundant. Essentially you end up maintaining exactly the same information in two different data structures, which doesn't look like the right thing to do.

Huh less work?
Bitboards only needs an xor to lift up and an xor to set back down.
Ok the rotated needs tables..

Double book keeping doesn't appeal to me either, I wouldn't do it for just 5% extra speed.
I have actually tried running with the 16x16 board for faster move generation, but it ran marginally slower so I took it out.

I'm not sure I understand what you mean here. I always loop through all pieces on the board at least once in the evaluation function, and I don't see how I can avoid this even when using bitboards. During this loop, the kind of information you describe is easily obtained.

I don't loop through all the pieces. I check for the things I find interesting and the rest, well, I have no opinion on those pieces so they can just use their default value.

It means I have to check for each pattern only once, instead of each pattern for every piece. I believe this is simply faster whenever there is more than one type of that piece.

In some cases you don't have to extract any bits because a simple & will tell you there are no pieces there, and in other cases just knowing if there is a piece or not will be sufficient.

This is, as far as I can see, the main advantage with bitboards compared to piece lists in this respect. I hoped that this would be sufficient to compensate for the expensive operation of looping through nonzero bits. My experience so far is that it isn't even close to sufficient.

I think the way piece lists and board information is merged is their biggest strength. You can do operations directly on the piece list because the piece list _is_ the board.

Operations like: give me a list of all the black pieces attacked by white pawns, is one line of code. No looping.

Sometimes you also need to work on two types of pieces, e.g. queens and rooks. No problem, just OR the two "piece lists".

-S.

by **Brian Richardson** » 06 Nov 2004, 22:16

There is no reason not to use a combination of various board array and bitboard representations, and piece lists too, if you prefer.

Bitboards are most beneficial when you do NOT loop over them,
but just use and/or operations and narrow things down to
a complex case, and then look for just a few bits set in
the result.

For example, suppose you wanted to find all of the pieces which
could be attacked by a pawn in one move. The non-bitboard way
might be to loop over each pawn or piece and see if a pawn can
attack. With many pawns and pieces this might be slow, but faster
in the endgame without many of either.

Of course, you could do this by looping over bitboards too,
but I think the preferred bitboard method is to shift a bitmap
of all opponent pawns left and right (masking off the edge files of course) and anding that with the piece bitmap. The idea is to work on bitmaps
as sets of information, not each bit at a time. Thus it is pretty easy
to handle the up 2 ranks initial pawn move in the above example
by anding all pawns with a rank 2 mask, shifting up 8, and anding with empty squares, and then oring that with the original pawn bitmap before doing the "attack" left and right shifts.

Also, I think pre-computed tables (such as squares which must be
empty between any two squares so you can and that with a piece
bitmap to see if a sliding piece can move there) are pretty much
manditory, IMHO.

by **Zach Wegner** » 07 Nov 2004, 07:53

Hello Tord,

Tord Romstad wrote:What is the best way to compute a bitboard containing 1s at all squares between two given squares, and 0s everywhere else? For instance, given the squares a2 and f7, I want a bitboard with 1s at b3, c4, d5 and e6. I currently do this with a huge lookup table indexed by all pairs of squares, which is of course a very ugly and unsatisfactory solution.

Well that is not necessarily the only way. I use it, but I am not concerned with data size but rather simplicity and to a lesser extent speed. But as a couple of other ways to compute the information:

1) Use more complicated addressing. You could have a table that compresses the distance between the two squares into 0..26 (the most squares a piece could move to). Or if you indexed by the greater square, you could condense it to 0..20. Code example:

Code: Select all: bitboard getray(int sq1, int sq2) { if (sq1 > sq2) return raytable[sq1][sqdelta[sq1 - sq2]]; else return raytable[sq2][sqdelta[sq2 - sq1]]; }

2) Use the mythical "reversed bitboards." You can get all of the squares with indeces between the two squares by subtracting the smaller from the larger, and use a table to look up the squares the form a ray. You need a somewhat complex or big table, but this would still be less data than the big lookup. Some code:

Code: Select all: bitboard getray(int sq1, int sq2) { if (sq1 > sq2) return (sq1 - sq2) & rays[sq1][direction[sq1 - sq2]]; else return (sq2 - sq1) & rays[sq2][direction[sq2 - sq1]]; }

Here the rays[64][4] array has rays from a square the go to the edge of the board, but are only in a certain direction. This can be 0..3, or 0..4 if you are not sure the squares are on a ray. direction is 64 elements.

It seems that one of the main reasons for the slowdown in the bitboard version of Glaurung is that looping through the nonzero bits of a bitboard is very slow compared to looping through a piece list. When I introduced bitboards, I ripped out the piece lists in order to simplify and speed up the make_move and unmake_move functions. I am now wondering whether this was really a good idea. Do bitboard engines usually have some kind of piece lists in addition to bitboards?

I do not have one, and I do not think it would be very productive to have one. It is ugly, and anywhere you could need to loop through pieces you could get an advantage (though I'm not sure how it compares to the disadvantage you speak of) by already knowing the piece type. Like where you would have for(pieces){if(piece==pawn)...} you could have specific loops for whatever piece you need. This seems like it would be faster than a loop and a switch for every piece, but you need the bitscan.

Regards,
Zach

by **Russell Reagan** » 07 Nov 2004, 10:28

Hi Tord,

Would you be willing to give an example of how you will use the "ray between squares" bitboard information? Sometimes when a person is used to offset based tricks, bitboard tricks are not always obvious. Maybe there is a better approach than the "ray between squares" bitboard idea.

Anyway, you can use the 0x88 trick that you are no doubt familiar with (unique square differences, 256-element lookup table). Converting the bitboard coordinates (0...63) to 0x88 coordinates (0...128) is simple.

Code: Select all: // I think this is right... (untested) unsigned bb_to_x88 (unsigned sq) { return (sq & 56) + sq; }

Code: Select all: // Again, untested... table[sq1 - sq2 + 128] << min(sq1, sq2); // Example: // table[g7 - b2 + 128] contains this bitboard: // 00000000 // 00000000 // 00000000 // 00001000 // 00010000 // 00100000 // 01000000 // 00000000 // min(g7,b2) == b2, so shift by the value of b2 (0...63 format)

You might even be able to do it based on the (0...63) square difference and replace the 256-element table with a 64-element table if you mask the final value correctly. I'll have to think about that after I get some sleep. It is late here. I hope I didn't make too many errors

. I'd be happy to write some code to demonstrate the idea if you'd like (after I get some sleep, of course

). Let me know.

I also need to nitpick Sune

. You don't have to use lookup tables to update rotated bitboards. You can rotate the square using some math and then get the bitmask of the rotated square. For instance, to rotate a square 90 degrees you can convert the square to its rank and file, then swap them and recompute the square. I don't know if it would be faster though

by **Sune Fischer** » 07 Nov 2004, 12:46

Hi Russell,

I think the 256 table needs a special cases for File(sq1)<File(sq2), ie. what if the squares are d2 and a5 then the ray needs to point in the other way.
With some branches and shifts it's no doubt possible to compress the table.

I guess I don't have to say this is both ugly, complicated and positively slower.

I also need to nitpick Sune . You don't have to use lookup tables to update rotated bitboards. You can rotate the square using some math and then get the bitmask of the rotated square. For instance, to rotate a square 90 degrees you can convert the square to its rank and file, then swap them and recompute the square. I don't know if it would be faster though

Certainly. I experimented for years trying to beat Crafty's rotated.
Too many tables I thought, must be a faster way.
But you've got to hand it to him, when it comes to the efficiency
of tables vs. computations the guy knows what he is doing.

-S.

by **Brian Richardson** » 07 Nov 2004, 17:32

Very nice idea...I wish I had thought of this 6 years ago, but Tinker has way too much now built around A1 as bit 0.

As it happens, Tinker spends less than 10% of its time in the attack functions and capture move generation, so while elegant, the approach you mention would not likely speed things up enough for a major rewrite.

Nonetheless, this is the sort of thing that one should keep in mind when embarking on the "second" or "third" fresh start engines in the future.

Thanks,
Brian

by **Russell Reagan** » 07 Nov 2004, 20:59

Sune Fischer wrote:Hi Russell,

I think the 256 table needs a special cases for File(sq1)<File(sq2), ie. what if the squares are d2 and a5 then the ray needs to point in the other way.
With some branches and shifts it's no doubt possible to compress the table.

I guess I don't have to say this is both ugly, complicated and positively slower.

Hmmm. I think it still works because table[sq1-sq2+128] contains the same thing as table[sq2-sq1+128]. In other words, sq1 and sq2 are interchangeable, so you could always say that File(sq1) < File(sq2). You could assign sq1=d2, sq2=a5; or you could assign sq1=a5, sq2=d2; and I think it will still work.

Code: Select all: table[d2-a5+128] contains: 00000000 00000000 00000000 00000000 00000000 00000100 00000010 00000001 Then left shift that by min(d2, a5).

What do you think?

by **Sune Fischer** » 07 Nov 2004, 21:46

Russell Reagan wrote:
Code: Select all
table[d2-a5+128] contains: 00000000 00000000 00000000 00000000 00000000 00000100 00000010 00000001 Then left shift that by min(d2, a5).

What do you think?

I think in this case you need to shift by d2-7.

There are two diagonals and so different shifts will be required pending the direction, unless I'm overlooking something.

-S.

by **Russell Reagan** » 08 Nov 2004, 09:13

I got it to work.

Code: Select all: bitboard_t in_between (int sq1, int sq2) { return table[bb_to_x88(sq1) - bb_to_x88(sq2) + 128] << min(sq1, sq2); } print_bitboard( in_between(d2, a5) ); print_bitboard( in_between(a5, d2) ); -------- -------- -------- -------- -#------ --#----- -------- -------- -------- -------- -------- -------- -#------ --#----- -------- --------

by **Tord Romstad** » 09 Nov 2004, 10:54

Sune Fischer wrote:
Look at Crafty, it's got 2 MB tables and is still crunching happily along at 1 MNps, it doesn't even seem to profit from larger caches!

Thanks for bringing up Crafty. It is a good example of the type of program I *don't* want to write. This is not (of course) a matter of speed or strength, but of aesthetics. I don't like Crafty's source code at all. The big tables are not the only problem, I also dislike the huge amounts of "cut and paste code", where a code segment handling white pieces is followed by an almost identical code segment for the black pieces.

So, you want to see ragingly high nps you have to drop all the board scans and start using some tables!
There, I said it.

I am fairly sure you can find examples to prove that ragingly high nps is possible with any sensible board representation.

Okay, alternatives.
Well what about a switch in the x-y direction to generate that attack line from y. Then see if it hits x and z. Or start by checking that x,y and y,z are on a line.

I've considered something like this. It doesn't seem like a typical bitboard approach, but I suppose it would work well.

You can convert to 0x88 first so you only need that tiny 256 table

No need to convert -- I already use a 16-file board array.

In fact, updating the piece list when moves are made and unmade is less work (in terms of lines of code as well as execution time) than updating bitboards. But having bitboards *and* piece lists seems very redundant. Essentially you end up maintaining exactly the same information in two different data structures, which doesn't look like the right thing to do.

Huh less work?
Bitboards only needs an xor to lift up and an xor to set back down.
Ok the rotated needs tables..

Yes, exactly. I have a bigger number of rotated bitboards than most other engines, because I want to be able to generate stuff like x-ray attacks of sliding pieces through knights. As far as I can see, this forces me to have separate rotated bitboards for each piece type.

Once again, thanks for your ideas!

Tord

by **Tord Romstad** » 09 Nov 2004, 11:12

Russell Reagan wrote:Hi Tord,

Hi Russell,

Welcome on board! :-)

Would you be willing to give an example of how you will use the "ray between squares" bitboard information?

Sure. At the moment, I need it for generating check evasions (blocking checks from sliding pieces). Later, I may want to use it for other purposes related to pins, discovered attacks and blocking sliding attacks in my evaluation function. My program is still in an embryonic stage, and it is still too early to say exactly what I will want to do.

Sometimes when a person is used to offset based tricks, bitboard tricks are not always obvious.

Definitely. The data structures we have experience with shape our way of thinking to a great extent. I found it amusing to read one of Sune's replies in this thread, where he wrote (about non-bitboard board representations) that "almost every time you need something you need it in reverse". I could have said exactly the same thing about bitboards.

Maybe there is a better approach than the "ray between squares" bitboard idea.

Anyway, you can use the 0x88 trick that you are no doubt familiar with (unique square differences, 256-element lookup table). Converting the bitboard coordinates (0...63) to 0x88 coordinates (0...128) is simple.

It is, but it is even simpler to just use 0..128 as bitboard coordinates, without doing any conversion. This is what I do. My first_1() function returns a square index in the 0..128 range.

Thanks a lot for your code! I haven't tried the idea yet (no time for programming these days), but it looks like the most interesting I have seen so far.

Thanks also to the numerous other contributors to this thread! I highly appreciate your help.

Tord

by **Russell Reagan** » 09 Nov 2004, 12:55

Tord Romstad wrote:Welcome on board!

Thanks! It is a nice new board.

Definitely. The data structures we have experience with shape our way of thinking to a great extent. I found it amusing to read one of Sune's replies in this thread, where he wrote (about non-bitboard board representations) that "almost every time you need something you need it in reverse". I could have said exactly the same thing about bitboards.

Hehe. That's very true, even about life in general. Recently during the presidential elections I often heard people saying, "I am voting for Bush because I want our country to be safe!" Another person says, "That is the exact reason I am voting for Kerry!"

It is, but it is even simpler to just use 0..128 as bitboard coordinates, without doing any conversion. This is what I do. My first_1() function returns a square index in the 0..128 range.

Interesting idea. Do you bitscan "normally" (any of the well known ways) and then convert, or do you compute the 0..127 coordinate directly? Now that I think about it, it should be simple to change this bitscan to return 0..127 coordinates by altering the "result =" and "result +=" lines (and the 8-bit lookup table values).

Code: Select all: // Originally by Eugene Nalimov int Bitscan (Bitboard arg) { int result = 0; if (arg > 0xFFFFFFFF) { arg >>= 32; result = 32; } if (arg > 0xFFFF) { arg >>= 16; result += 16; } if (arg > 0xFF) { arg >>= 8; result += 8; } return result + table8[arg]; }

Thanks a lot for your code! I haven't tried the idea yet (no time for programming these days), but it looks like the most interesting I have seen so far.

I sent you the code I wrote. I didn't test it thoroughly, but it worked on all of the cases I tried.

by **Alessandro Scotti** » 21 Nov 2004, 01:14

Tord Romstad wrote:Hi all,

It seems that one of the main reasons for the slowdown in the bitboard version of Glaurung is that looping through the nonzero bits of a bitboard is very slow compared to looping through a piece list. When I introduced bitboards, I ripped out the piece lists in order to simplify and speed up the make_move and unmake_move functions. I am now wondering whether this was really a good idea. Do bitboard engines usually have some kind of piece lists in addition to bitboards?

Tord

Hi Tord,
finding non-zero bits in a bitboard is the only piece of assembly code I have in my program (well ok, I added a bit count function recently). These few lines of inline code can easily speed up your program by 3x or more. You can get samples from Crafty code or I can send you my own if you wish.
Hey... wait a minute! Is that Glaurung the same program that is playing with Kiwi in some tournaments and well ahead of it in the standings? If so, forget what I said, and this message will self-destruct in 5 seconds...

Winboard Forum

Two questions for bitboard experts

Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Re: Two questions for bitboard experts

Who is online