Tord:
- Code: Select all
typedef struct {
uint8 board_[256];
uint8 *board;
} position_t
>However, I almost never use the board_[] array directly. Instead, I do
> something similar to this when initialising the position data structure:
- Code: Select all
void initialise_position(position_t *pos) {
pos->board = pos->board_ + 64; /* 64 = index of first square on the "real board" */
}
>In the rest of my program, I always use the board pointer instead of the >board_[] array.
This sounds a bit less efficient (on x86 hardware) than the functional equivilant
#define A1_OFFSET 64
#define BOARD(sq) board_[(sq) + A1_OFFSET]
Using the pointer needs one indirection more, to access the value. Constant offsets (almost) don't need any time. On x86 they will just change the used upcode. For offsets > 127 (or so, I don't remember the details), the upcode gets a bit longer, however - so the program may use a bit more of code space, and might be less cache friendly. Also, accessing data through a pointer can make things harder for the optimizer.
uint8 *bp;
bp[sq] = something;
some_global_value = something_else;
bp[sq+1] = bp[sq];
The optimizer now cannot produce
bp[sq+1] = something; // which still might be cached in one register
Of course, this small example is not really realistic, and one would code it a bit different from the beginning. But it may show the point.
>There are two equivalent ways to test whether a square is outside the real board:
- Code: Select all
if(pos->board[sq] == OUTSIDE) { ... }
or
- Code: Select all
if(sq & 0x88) { ... }
>In practice, I have found that the first method is almost always a tiny bit faster
A surprise to me. The 0x88 method is just one typical test upcode (which any machine will have, althoug some RISC type machines may need to put 0x88 into an register first - on x86 the constant is inside the upcode).
The mailbox methods needs a memory access (most probably from cache), which just sounds slower.
Not that I think, that the points I mentioned really matter. As I mentioned before, I had added debug code somewhere in the inner loop. Something like
if (hashkey == 123456)
printf("got you\n"); /* to convenietly set a debugging breakpoint */
and the code was reproducably faster.
Another time from a speed discussion in CCC - we wanted to benchmark several abs() macros. It turned out, that using the abs() of Omid produced faster code, than using an empty abs() ... I checked the produced assembler, and all looked correct - the empty abs version was just missing a few assembly statements compared to the real one.
Cheers,
Dieter