combining a ripple adder with the snoob ("same number of one bits") algorithm might be usefull to find magics faster ...
How it works:
- Code: Select all
set:
... 1110 0110 0x..e6
eg. all subsets with two bits sets:
... 0000 0110 0x..06
... 0010 0010 0x..22
... 0010 0100 0x..24
... 0100 0010 0x..42
... 0100 0100 0x..44
... 0110 0000 0x..60
... 1000 0010 0x..82
... 1000 0100 0x..84
... 1010 0000 0x..a0
... 1100 0000 0x..c0
If, after a ripple add of the least significat one-bit, the result has only two changed bits - it has already the same population count, and we are ready (01->10). Otherwise with > 2 changed bits (011->100, 0111->1000,...), we need to set changed-2 lower bits from the original set into the result (rip).
In the original snoob with all bits one in the set, this works loopless by:
- Code: Select all
rip = sub + (sub & -sub);
rip |= ((sub^rip) >> 2) >> bitScan(sub)
With arbitrary and even rare populated sets, i see no other way to implement a loop similar to Kerningham's population count, based on reset least significant one-bit by set&=set-1. If somebody is aware of a loopless version, let me know!
- Code: Select all
// get next greater subset of set with same number of one bits
u64 snoob (u64 sub, u64 set) {
u64 rip = (sub + (sub & -sub) - set - 1) & set;
for (sub^=rip, sub&=sub-1; sub&=sub-1; set&=set-1)
rip += set & -set;
return rip;
}
x64 assembly by vc2005:
- Code: Select all
_TEXT SEGMENT
sub$ = 8
set$ = 16
?snoob@@YA_K_K0@Z PROC
00000 48 8b c1 mov rax, rcx
00003 4c 8b c1 mov r8, rcx
00006 48 f7 d8 neg rax
00009 48 23 c1 and rax, rcx
0000c 48 2b c2 sub rax, rdx
0000f 4c 8d 4c 08 ff lea r9, QWORD PTR [rax+rcx-1]
00014 4c 23 ca and r9, rdx
00017 4d 33 c1 xor r8, r9
0001a 49 8d 40 ff lea rax, QWORD PTR [r8-1]
0001e 4c 23 c0 and r8, rax
00021 49 8d 40 ff lea rax, QWORD PTR [r8-1]
00025 4c 23 c0 and r8, rax
00028 74 22 je SHORT $LN8@snoob
0002a 66 66 90 66 66
90 npad 6
$LL3@snoob:
00030 48 8b ca mov rcx, rdx
00033 48 f7 d9 neg rcx
00036 48 23 ca and rcx, rdx
00039 4c 03 c9 add r9, rcx
0003c 48 8d 4a ff lea rcx, QWORD PTR [rdx-1]
00040 48 23 d1 and rdx, rcx
00043 49 8d 48 ff lea rcx, QWORD PTR [r8-1]
00047 4c 23 c1 and r8, rcx
0004a 75 e4 jne SHORT $LL3@snoob
$LN8@snoob:
0004c 49 8b c1 mov rax, r9
0004f c3 ret 0
?snoob@@YA_K_K0@Z ENDP
_TEXT ENDS
[edit]
Ok, this one looks much better ipc-wise, taking advantage of expressions already calculated to reset ls1b in rip. Same for the loop body - shorter and less dependent code.
- Code: Select all
// get next greater subset of set with same number of one bits
u64 snoob (u64 sub, u64 set) {
u64 rip =(sub+(sub&-sub)-set-1)&set, lsb;
for (sub^=rip^(sub&-sub); sub&=sub-1; rip+=lsb, set^=lsb)
lsb = set&-set;
return rip;
}
- Code: Select all
_TEXT SEGMENT
sub$ = 8
set$ = 16
?snoob@@YA_K_K0@Z PROC
00000 48 8b c1 mov rax, rcx
00003 4c 8d 41 ff lea r8, QWORD PTR [rcx-1]
00007 48 f7 d8 neg rax
0000a 4c 23 c1 and r8, rcx
0000d 48 23 c1 and rax, rcx
00010 48 2b c2 sub rax, rdx
00013 4c 8d 4c 08 ff lea r9, QWORD PTR [rax+rcx-1]
00018 4c 23 ca and r9, rdx
0001b 4d 33 c1 xor r8, r9
0001e 49 8d 40 ff lea rax, QWORD PTR [r8-1]
00022 4c 23 c0 and r8, rax
00025 74 21 je SHORT $LN8@snoob
00027 npad 9
$LL3@snoob:
00030 48 8b ca mov rcx, rdx
00033 48 f7 d9 neg rcx
00036 48 23 ca and rcx, rdx
00039 4c 03 c9 add r9, rcx
0003c 48 33 d1 xor rdx, rcx
0003f 49 8d 48 ff lea rcx, QWORD PTR [r8-1]
00043 4c 23 c1 and r8, rcx
00046 75 e8 jne SHORT $LL3@snoob
$LN8@snoob:
00048 49 8b c1 mov rax, r9
0004b c3 ret 0
?snoob@@YA_K_K0@Z ENDP
_TEXT ENDS
[/edit]
Cheers,
Gerd