BitSet wrote:Is Classical Approach (NOT in one run) faster than Shifted Bitboards?
Depends on the 64-bit hardware, with recent >= core2 and K10 with fast bitscan instruction, yes. On old AMD K8 Athlons with dead slow 10/11 cycles vector path bsf/bsr probably not. In 32-bit mode so many 64-bit shifts are horror of course.
Per ray it is more or less 11 instructions (with some parallel gain, which otoh costs some registers), D is compile time constant
- Code: Select all
(x << D) | (x << 2D) | (x << 3D) | (x << 4D) | (x << 5D) | (x << 6D)
versus one or-bitscan-L1lookup, the or "of the blocker" is needed for a branchless approach:
- Code: Select all
raydirLookup[bitscan(X | ENDBIT)]
Since the non-rotated attack-getter interface is all equal, one may play with different implementations and approaches like Kindergarten BBs, Obstruction Difference, Hyperbola Quintessence, or finally Magic BBs later, if one has nothing else to do
One possible improvement of Shifted BBs would be to use parallel prefix shifts with 6 instead of 11 instructions (but more dependencies) :
- Code: Select all
x |= (x << D);
x |= (x << 2D);
x |= (x << 4D);
but still ...