Winboard Forum

Posted: **23 Aug 2006, 16:24**

On 32 bit processor the right shift operator (>>) of 64 bit integer (u64) is very slow.
My code below seems faster.

I tried it with visual c++ on a Pentium 4 processor.

shift 7 bit...
u64 shr7(const u64 bits){ //bits >> 7
unsigned s2 = (((unsigned*)&bits)[1]);
u64 x=(((unsigned) bits)>>7)|(s2<<25); // 25 = 32 - 7
((unsigned*)(&x)+1)[0]=(s2>>7);
return x;
}

shift N bit...
u64 shrN(const u64 bits,const int N){ //bits >> N
assert(N);
if (N<32){
unsigned s2 = (((unsigned*)&bits)[1]);
u64 x=(((unsigned) bits)>>N)|(s2<<(32-N));
((unsigned*)(&x)+1)[0]=(s2>>N);
return x;
}
u64 x=shift32(bits)>>(N-32);
return x;
}

bye giuseppe

Posted: **23 Aug 2006, 19:55**

Hi Giuseppe,

yes, your code is faster for P4 - and it could be inlined. The _aullshr is a call with 3 cases. I would prefere a 64-bit shift with shift amount modulo 64 and only two cases and an inlined intrinsic as well, rather than call/ret overhead, and possible "random" shift amounts called from different contexts, which makes it eventually harder to predict the >= 32 branch correctly. Otoh inlining conditional branches a lot may pollute branch target buffer, so everything has two sides...

Code: Select all: _aullshr: 0040E390 80 F9 40 cmp cl,40h 0040E393 73 15 jae RETZERO 0040E395 80 F9 20 cmp cl,20h 0040E398 73 06 jae MORE32 0040E39A 0F AD D0 shrd eax,edx,cl 0040E39D D3 EA shr edx,cl 0040E39F C3 ret MORE32: 0040E3A0 8B C2 mov eax,edx 0040E3A2 33 D2 xor edx,edx 0040E3A4 80 E1 1F and cl,1Fh 0040E3A7 D3 E8 shr eax,cl 0040E3A9 C3 ret RETZERO: 0040E3AA 33 C0 xor eax,eax 0040E3AC 33 D2 xor edx,edx 0040E3AE C3 ret

The shrd is very slow on P4, thus i guess an inlined shrd-replacement, like you suggest might be faster. But shift on P4 is "dead"-slow anyway (shift alu is located in the MMX-unit).

For better readability i prefere anonymious 64/32[2]-bit unions rather than pointer casts. Both methods are not portable with respect to endianess. So simple >>/<< 32 is the prefered method to get and set high dwords and should not produce any overhead in 32-bit mode.

Why not looking forward to a better 32/64-bit speedup? ;-)

Cheers,
Gerd

Winboard Forum

shift 64 bit

shift 64 bit

Re: shift 64 bit