GCC Inline asm problem

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

GCC Inline asm problem

Postby Eduardo Waghabi » 18 Jul 2005, 22:28

Hi there,

I was messing around with the TSCP inline code for the first bit set:
Code: Select all
__inline int first_one(u64 bits)
{
    __asm
    {
        mov edx, 0
        bsf eax, dword ptr bits
        jnz done
        mov edx, 32
        bsf eax, dword ptr bits+4
        jnz done
        mov eax, 32
done:   add eax, edx
    }
}


and trying to port it into GCC inline asm code so I could compile it under Cygwin's GCC. This is what I've got:

Code: Select all
int first_one(TBitBoard bitboard) {
  int idxBit = -1;

  asm("        bsfl  %1, %0       \n\t" // bsf eax, dword ptr b
      "        jnz   done         \n\t" // jnz done
      "        movl  $0x20, %2    \n\t" // mov edx, 32
      "        bsfl  4(%1), %0    \n\t" // bsf eax, dword ptr [b+4]
      "        jnz   done         \n\t" // jnz done
      "        movl  $0x20, %0    \n\t" // mov eax, 32
      "done:   add   %2   , %0    \n\t" // add edx, eax
      : "=r" (idxBit)                   // out reg: %0=idxBit
      : "r" (bitboard), "r" (0)         // in regs: %1=bitboard, %2=0
      : "cc"                            // I mess with conditions codes
      );
  return idxBit;
}

Quite obviously it didn't work. Now can anyone please tell me why?

Thanks!
Eduardo Waghabi
 
Posts: 13
Joined: 15 Jul 2005, 19:43
Location: Rio de Janeiro, Brazil

Re: GCC Inline asm problem

Postby Alessandro Scotti » 18 Jul 2005, 22:55

This is what I use in Kiwi:

Code: Select all
inline int bitScanForward( Uint64 data )
{
    int bit;

    asm(
        "   testl   %%eax, %%eax    \n\t"
        "    jz     1f              \n\t"
        "   bsf     %%eax, %0       \n\t"
        "    jmp    2f              \n\t"
        "1:                         \n\t"
        "   movl    $-1, %0         \n\t"
        "   testl   %%edx, %%edx    \n\t"
        "    jz     2f              \n\t"
        "   bsf     %%edx, %0       \n\t"
        "   addl    $32, %0         \n\t"
        "2:                         \n\t"
        : "=&r" (bit)
        : "A" (data)
    );

    return bit;
}


This function returns -1 if no bit is set. (Eventually, I decided to compile without inline assembly though.)
Recently, Dann Corbit showed me a great "asm-free" implementation:

Code: Select all
/* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Matt Taylor's implementation of the de Bruijn bitscan:

>From a post on CCC:
http://chessprogramming.org/cccsearch/ccc.php?art_id=306789

Subject : Re: There is huge potential to improve further
Posted by : Matt Taylor on July 16, 2003 at 22:44:43

Which is indirectly a reference to this work:

"Using de Bruijn Sequences to Index a 1 in a Computer Word"

Charles E.Leiserso
Harald Prokop
Keith H, Randall

MIT Laboratory for Computer Science, Cambridge, MA 02139, USA
July 7, 1998

http://supertech.csail.mit.edu/papers/debruijn.ps

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< */
const int       lsz64_tbl[64] =
{
    0, 31,  4, 33, 60, 15, 12, 34,
   61, 25, 51, 10, 56, 20, 22, 35,
   62, 30,  3, 54, 52, 24, 42, 19,
   57, 29,  2, 44, 47, 28,  1, 36,
   63, 32, 59,  5,  6, 50, 55,  7,
   16, 53, 13, 41,  8, 43, 46, 17,
   26, 58, 49, 14, 11, 40,  9, 45,
   21, 48, 39, 23, 18, 38, 37, 27,
};

int             BitScan_MT_deBruijn (Bitboard bb)
{
   const Bitboard  lsb = (bb & -(long long) bb) - 1;
   const unsigned int foldedLSB = ((int) lsb) ^ ((int) (lsb >> 32));
   return lsz64_tbl[foldedLSB * 0x78291ACF >> 26];
}


Note that this version won't work with empty (zero) bitboards, and I have a really hard time trying to understand *how* it works in all the remaining cases! :)
User avatar
Alessandro Scotti
 
Posts: 306
Joined: 20 Nov 2004, 00:10
Location: Rome, Italy

Re: GCC Inline asm problem

Postby Gian-Carlo Pascutto » 19 Jul 2005, 00:25

It's basically a perfect hash.

You transform the input to the same input but with only 1 bit set, then you map the 64 possible values to 1..64.

The construction of the hash with multiply+shift is some neat mathematical theory.
Gian-Carlo Pascutto
 
Posts: 42
Joined: 04 Jul 2005, 13:24
Location: Belgium

Re: GCC Inline asm problem

Postby Dann Corbit » 19 Jul 2005, 00:30

The postscript paper (upthread) explains it.

It is easy to construct a perfect hash.

It is not easy to create a perfect hash that is so cheap to compute.
Dann Corbit
 

not ASM popCnt

Postby Pradu » 19 Jul 2005, 00:51

You probably want a fast C implementation of a population count function as well:

Code: Select all
/*iterativePopCount()
 *an iterative population count of 1 bits in a quadword
 *this population count is better than the regular population count if the
 *number of 1 bits is sparce (usually less than 5 bits)
 *
 *@param b - the quadword to be counted
 *@returns the number of 1 bits in b
 */
int iterativePopCount(U64 b)
{
    int n;
    for (n = 0; b != 0; n++, b &= (b - 1));
    return n;
}


An explanation of the iterative population count is can be found here.

Basically b &= (b - 1) clears the first bit and it is trivial from there.

And here is a non-interative population count for densely populated quadwords from Hacker's Memory collection of programming tips:

Code: Select all
/*popCount()
 *a noniterative population count of 1 bits in a quadword
 *
 *@param b - the quadword to be counted
 *@returns the number of 1 bits in b
 */
#define m1 0x5555555555555555ULL
#define m2 0x3333333333333333ULL
int popCount(const U64 b)
{
    int n;
    const U64 a = b - ((b >> 1) & m1);
    const U64 c = (a & m2) + ((a >> 2) & m2);
    n = ((int) c) + ((int) (c >> 32));
    n = (n & 0x0F0F0F0F) + ((n >> 4) & 0x0F0F0F0F);
    n = (n & 0xFFFF) + (n >> 16);
    n = (n & 0xFF) + (n >> 8);
    return n;
}


Both of these work fine for me.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: GCC Inline asm problem

Postby Eduardo Waghabi » 19 Jul 2005, 03:29

Many thanks for the feedback. The perfect hash trick is really neat.

As for my code, I eventually figured out I should pass &bitboard as opcode %1, as I was using the ptr operand (). I've changed the functions, to make it a little bit mine, so for who may care:

FirstBit:
Code: Select all
int BitMenosSignificativo(TBitBoard bitboard) {
  int idxBit = -1;

  asm("       bsfl  (%1), %0          \n\t"
      "       jnz   ebsf              \n\t"
      "       movl  $-0x01, %0        \n\t"
      "       orl   $0,4(%1)          \n\t"
      "       jz    ebsf              \n\t"
      "       bsfl  4(%1), %0         \n\t"
      "       addl  $0x20, %0         \n\t"
      "ebsf:  nop                     \n\t"
      :"=r" (idxBit):"r" (&bitboard));
  return idxBit;
}


LastBit:
Code: Select all
int BitMaisSignificativo(TBitBoard bitboard) {
  int idxBit = -1;

  asm("       bsrl  4(%1), %0     \n\t"
      "       jnz   ebsr1         \n\t"
      "       movl  $-0x01, %0    \n\t"
      "       orl   $0,(%1)       \n\t"
      "       jz    ebsr          \n\t"
      "       bsrl  (%1), %0      \n\t"
      "       jnz   ebsr          \n\t"
      "ebsr1: addl  $0x20, %0     \n\t"
      "ebsr:  nop                 \n\t"
      :"=r" (idxBit):"r" (&bitboard));
  return idxBit;
}

Those last two return -1 if the bitboard is empty. And if you know for sure the bitboard is not empty, here are faster versions:

FirstBit:
Code: Select all
int BitMenosSigPreench(TBitBoard bitboard) {
  int idxBit = -1;

  asm("       bsfl  (%1), %0      \n\t"
      "       jnz   efbsf         \n\t"
      "       bsfl  4(%1), %0     \n\t"
      "       addl  $0x20, %0     \n\t"
      "efbsf: nop                 \n\t"
      :"=r" (idxBit):"r" (&bitboard));
  return idxBit;
}

LastBit:
Code: Select all
int BitMaisSigPreench(TBitBoard bitboard) {
  int idxBit = -1;

  asm("        bsrl  4(%1), %0     \n\t"
      "        jnz   efbsr1        \n\t"
      "        bsrl  (%1), %0      \n\t"
      "        jnz   efbsr         \n\t"
      "efbsr1: addl  $0x20, %0     \n\t"
      "efbsr:  nop                 \n\t"
      :"=r" (idxBit):"r" (&bitboard));
  return idxBit;
}


Hey, I might start to like this inline assembly stuff.
Last edited by Eduardo Waghabi on 19 Jul 2005, 13:28, edited 1 time in total.
Eduardo Waghabi
 
Posts: 13
Joined: 15 Jul 2005, 19:43
Location: Rio de Janeiro, Brazil

Re: GCC Inline asm problem

Postby Pradu » 19 Jul 2005, 03:39

Hey, I might start to like this inline assembly stuff.


It gets ugly and it is tedious to make the source portable. Atleast thats my view on it...
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: GCC Inline asm problem

Postby Reinhard Scharnagl » 19 Jul 2005, 07:36

Hi Pradu,

the popCount() function is suboptimal - am I rigth ?

Reinhard.
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: GCC Inline asm problem

Postby Dann Corbit » 19 Jul 2005, 21:59

Reinhard Scharnagl wrote:Hi Pradu,

the popCount() function is suboptimal - am I rigth ?

Reinhard.


Some alternatives:
Code: Select all

#include <assert.h>

typedef unsigned long long bitboard;
extern unsigned int t0_and_masks(bitboard x);
extern unsigned int t1_and_masks(bitboard i);
extern unsigned int t2_mult_masks(bitboard x);
extern unsigned int t3_rmlsbsub(bitboard n);
extern unsigned int t4_rmlsbmask(bitboard n);
extern unsigned int t5_testlsb(bitboard n);
extern unsigned int t6_testmsb(bitboard n);
extern unsigned int t7_testsignandshift(bitboard n);
extern unsigned int t8_testeachbit(bitboard n);
extern unsigned int t9_testeachbit1shl(bitboard n);
extern unsigned int tA_tableshift(bitboard n);
extern unsigned int tB_tableuchar(bitboard n);
extern unsigned int tC_tableshiftcast(bitboard n);
extern unsigned int tD_itableshift(bitboard n);
extern unsigned int tE_itableuchar(bitboard n);
extern unsigned int tF_itableshiftcast(bitboard n);
extern unsigned int tG_parallel(bitboard n);
extern unsigned int tH_hamming(bitboard w);


unsigned        t0_and_masks(bitboard x)
{
    assert(x);
    x = (x >> 1 & 0x5555555555555555) + (x & 0x5555555555555555);
    x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
    x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
    x = ((x >> 8) + x) & 0x00ff00ff00ff00ff;
    x = ((x >> 16) + x) & 0x0000ffff0000ffff;
    return (unsigned) ((x + (x >> 32)) & 0xff);
}
unsigned        t1_and_masks(bitboard i)
{
    unsigned        j;
    assert(i);
    i = (i & 0x5555555555555555) + (i >> 1 & 0x5555555555555555);
    i = (i & 0x3333333333333333) + (i >> 2 & 0x3333333333333333);
    i = ((i >> 4) + i) & 0x0f0f0f0f0f0f0f0f;
    j = (unsigned) (i + (i >> 32));
    j += j >> 16;
    return (j + (j >> 8)) & 0xff;
}


/*
 * This function counts the bits in a long long.
 *
 * It removes the lsb and counting the number of times round the loop.
 * The expression (n & -n) yields the lsb of a number,
 * but it only works on 2's compliment machines.
 */
unsigned        t3_rmlsbsub(bitboard n)
{
    unsigned        count;
    assert(n);
    for (count = 0; n; n -= (n & -n))
        count++;
    return count;
}

unsigned        t4_rmlsbmask(bitboard n)
{
    unsigned        count;
    assert(n);
    for (count = 0; n; count++)
        n &= (n - 1);           /* take away lsb */
    return count;
}

/*
 * This function counts the bits in a long long.
 *
 * It works by shifting the number down and testing the bottom bit.
 */
unsigned        t5_testlsb(bitboard n)
{
    unsigned        count;
    assert(n);
    for (count = 0; n; n >>= 1)
        count += (n & 1);

    return count;
}

/*
 * This function counts the bits in a long long.
 *
 * It works by shifting the number left and testing the top bit.
 * On many machines shift is expensive, so it uses a cheap addition instead.
 */
unsigned        t6_testmsb(bitboard n)
{
    unsigned        count;
    assert(n);
    for (count = 0; n; n += n)
        if (n & ~(~(bitboard) 0 >> 1))
            count++;
    return count;
}

unsigned        t7_testsignandshift(bitboard n)
{
    unsigned        count;
    assert(n);
    for (count = 0; n; n <<= 1)
        if ((long long) n < 0)
            count++;
    return (count);
}

/*
 * This function counts the bits in a long long.
 *
 * It works by masking each bit.
 * This is the second most intuitively obvious method,
 * and is independent of the number of bits in the long long.
 */
unsigned        t8_testeachbit(bitboard n)
{
    unsigned        count;
    bitboard        mask;
    assert(n);
    count = 0;
    for (mask = 1; mask; mask += mask)
        if (n & mask)
            count++;
    return count;
}

/*
 * This function counts the bits in a long long.
 *
 * It works by masking each bit.
 * This is the most intuitively obvious method,
 * but how do you a priori know how many bits in the long long?
 * (except for ''sizeof(long long) * CHAR_BITS'' expression)
 */
unsigned        t9_testeachbit1shl(bitboard n)
{
    unsigned        count;
    unsigned        bit;
    assert(n);
    count = 0;
    for (bit = 0; bit < 64; ++bit)
        if (n & ((bitboard) 1 << bit))
            count++;
    return count;
}


static const char bits_in_byte[256] =
{
    0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
    1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
    1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
    2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
    1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
    2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
    2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
    3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
    1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
    2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
    2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
    3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
    2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
    3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
    3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
    4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
};

unsigned        tA_tableshift(bitboard n)
{
    assert(n);
    return (bits_in_byte[n & 0xff] + bits_in_byte[(n >> 8) & 0xff] +
            bits_in_byte[(n >> 16) & 0xff] + bits_in_byte[(n >> 24) & 0xff] +
            bits_in_byte[(n >> 32) & 0xff] + bits_in_byte[(n >> 40) & 0xff] +
            bits_in_byte[(n >> 48) & 0xff] + bits_in_byte[n >> 56]
        );
}

unsigned        tB_tableuchar(bitboard n)
{
    unsigned char  *p = (unsigned char *) &n;
    assert(n);
    return (bits_in_byte[p[0]] + bits_in_byte[p[1]] + bits_in_byte[p[2]] + bits_in_byte[p[3]] +
            bits_in_byte[p[4]] + bits_in_byte[p[5]] + bits_in_byte[p[6]] + bits_in_byte[p[7]]
        );
}

unsigned        tC_tableshiftcast(bitboard n)
{
    assert(n);
    return bits_in_byte[(unsigned char) n] +
        bits_in_byte[(unsigned char) (n >> 8)] +
        bits_in_byte[(unsigned char) (n >> 16)] +
        bits_in_byte[(unsigned char) (n >> 24)] +
        bits_in_byte[(unsigned char) (n >> 32)] +
        bits_in_byte[(unsigned char) (n >> 40)] +
        bits_in_byte[(unsigned char) (n >> 48)] +
        bits_in_byte[(unsigned char) (n >> 56)];
}

unsigned        tD_itableshift(bitboard n)
{
    assert(n);
    return (bits_in_byte[n & 0xff] + bits_in_byte[(n >> 8) & 0xff] +
            bits_in_byte[(n >> 16) & 0xff] + bits_in_byte[(n >> 24) & 0xff] +
            bits_in_byte[(n >> 32) & 0xff] + bits_in_byte[(n >> 40) & 0xff] +
            bits_in_byte[(n >> 48) & 0xff] + bits_in_byte[(n >> 56)]
        );
}

unsigned        tE_itableuchar(bitboard n)
{
    unsigned char  *p = (unsigned char *) &n;
    assert(n);
    return (bits_in_byte[p[0]] + bits_in_byte[p[1]] + bits_in_byte[p[2]] + bits_in_byte[p[3]] +
            bits_in_byte[p[4]] + bits_in_byte[p[5]] + bits_in_byte[p[6]] + bits_in_byte[p[7]]
        );
}

unsigned        tF_itableshiftcast(bitboard n)
{
    assert(n);
    return bits_in_byte[(unsigned char) n] +
        bits_in_byte[(unsigned char) (n >> 8)] +
        bits_in_byte[(unsigned char) (n >> 16)] +
        bits_in_byte[(unsigned char) (n >> 24)] +
        bits_in_byte[(unsigned char) (n >> 32)] +
        bits_in_byte[(unsigned char) (n >> 40)] +
        bits_in_byte[(unsigned char) (n >> 48)] +
        bits_in_byte[(unsigned char) (n >> 56)];
}

unsigned        tG_parallel(bitboard n)
{
    assert(n);
    n = ((n) & (0xffffffffffffffff / ((1 << ((1 << (0)))) + 1))) + (((n) >> ((1 << (0)))) & (0xffffffffffffffff / ((1 << ((1 << (0)))) + 1)));
    n = ((n) & (0xffffffffffffffff / ((1 << ((1 << (1)))) + 1))) + (((n) >> ((1 << (1)))) & (0xffffffffffffffff / ((1 << ((1 << (1)))) + 1)));
    n = ((n) & (0xffffffffffffffff / ((1 << ((1 << (2)))) + 1))) + (((n) >> ((1 << (2)))) & (0xffffffffffffffff / ((1 << ((1 << (2)))) + 1)));
    n = ((n) & (0xffffffffffffffff / ((1 << ((1 << (3)))) + 1))) + (((n) >> ((1 << (3)))) & (0xffffffffffffffff / ((1 << ((1 << (3)))) + 1)));
    n = ((n) & (0xffffffffffffffff / ((1 << ((1 << (4)))) + 1))) + (((n) >> ((1 << (4)))) & (0xffffffffffffffff / ((1 << ((1 << (4)))) + 1)));
    n = ((n) & (0xffffffffffffffff / ((1 << ((1 << (5)))) + 1))) + (((n) >> ((1 << (5)))) & (0xffffffffffffffff / ((1 << ((1 << (5)))) + 1)));
    return (unsigned) n;
}

unsigned        tH_hamming(bitboard w)
{
    bitboard        res;
    assert(w);
    res = (w & 0x5555555555555555) + ((w >> 1) & 0x5555555555555555);
    res = (res & 0x3333333333333333) + ((res >> 2) & 0x3333333333333333);
    res = (res & 0x0F0F0F0F0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F0F0F0F0F);
    res = (res & 0x00FF00FF00FF00FF) + ((res >> 8) & 0x00FF00FF00FF00FF);
    res = (res & 0x0000FFFF0000FFFF) + ((res >> 16) & 0x0000FFFF0000FFFF);
    return (unsigned) ((res & 0x00000000FFFFFFFF) + ((res >> 32) & 0x00000000FFFFFFFF));
}
#ifdef UNIT_TEST
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int             main(void)
{
    unsigned int    ut0_and_masks = 0;
    unsigned int    ut1_and_masks = 0;
    unsigned int    ut3_rmlsbsub = 0;
    unsigned int    ut4_rmlsbmask = 0;
    unsigned int    ut5_testlsb = 0;
    unsigned int    ut6_testmsb = 0;
    unsigned int    ut7_testsignandshift = 0;
    unsigned int    ut8_testeachbit = 0;
    unsigned int    ut9_testeachbit1shl = 0;
    unsigned int    utA_tableshift = 0;
    unsigned int    utB_tableuchar = 0;
    unsigned int    utC_tableshiftcast = 0;
    unsigned int    utD_itableshift = 0;
    unsigned int    utE_itableuchar = 0;
    unsigned int    utF_itableshiftcast = 0;
    unsigned int    utG_parallel = 0;
    unsigned int    utH_hamming = 0;
    bitboard        b;
    int             i;
    for (i = 0; i < 10000000; i++) {
#ifdef DENSE
        b = rand();
        b <<= 32;
        b |= rand();
#else

        b = (1 << (rand() % 64));
        b <<= 32;
        b |= (1 << (rand() % 64));
#endif
        ut0_and_masks = t0_and_masks(b);
        ut1_and_masks = t1_and_masks(b);
        ut3_rmlsbsub = t3_rmlsbsub(b);
        ut4_rmlsbmask = t4_rmlsbmask(b);
        ut5_testlsb = t5_testlsb(b);
        ut6_testmsb = t6_testmsb(b);
        ut7_testsignandshift = t7_testsignandshift(b);
        ut8_testeachbit = t8_testeachbit(b);
        ut9_testeachbit1shl = t9_testeachbit1shl(b);
        utA_tableshift = tA_tableshift(b);
        utB_tableuchar = tB_tableuchar(b);
        utC_tableshiftcast = tC_tableshiftcast(b);
        utD_itableshift = tD_itableshift(b);
        utE_itableuchar = tE_itableuchar(b);
        utF_itableshiftcast = tF_itableshiftcast(b);
        utG_parallel = tG_parallel(b);
        utH_hamming = tH_hamming(b);
        if ((ut0_and_masks != ut1_and_masks) ||
            (ut0_and_masks != ut3_rmlsbsub) ||
            (ut0_and_masks != ut4_rmlsbmask) ||
            (ut0_and_masks != ut5_testlsb) ||
            (ut0_and_masks != ut6_testmsb) ||
            (ut0_and_masks != ut7_testsignandshift) ||
            (ut0_and_masks != ut9_testeachbit1shl) ||
            (ut0_and_masks != utA_tableshift) ||
            (ut0_and_masks != utB_tableuchar) ||
            (ut0_and_masks != utC_tableshiftcast) ||
            (ut0_and_masks != utD_itableshift) ||
            (ut0_and_masks != utE_itableuchar) ||
            (ut0_and_masks != utF_itableshiftcast) ||
            (ut0_and_masks != utG_parallel) ||
            (ut0_and_masks != utH_hamming)
            ) {
            printf("0: %u 1: %u\n", ut0_and_masks, ut1_and_masks);
            printf("0: %u 3: %u\n", ut0_and_masks, ut3_rmlsbsub);
            printf("0: %u 4: %u\n", ut0_and_masks, ut4_rmlsbmask);
            printf("0: %u 5: %u\n", ut0_and_masks, ut5_testlsb);
            printf("0: %u 6: %u\n", ut0_and_masks, ut6_testmsb);
            printf("0: %u 7: %u\n", ut0_and_masks, ut7_testsignandshift);
            printf("0: %u 9: %u\n", ut0_and_masks, ut9_testeachbit1shl);
            printf("0: %u A: %u\n", ut0_and_masks, utA_tableshift);
            printf("0: %u B: %u\n", ut0_and_masks, utB_tableuchar);
            printf("0: %u C: %u\n", ut0_and_masks, utC_tableshiftcast);
            printf("0: %u D: %u\n", ut0_and_masks, utD_itableshift);
            printf("0: %u E: %u\n", ut0_and_masks, utE_itableuchar);
            printf("0: %u F: %u\n", ut0_and_masks, utF_itableshiftcast);
            printf("0: %u G: %u\n", ut0_and_masks, utG_parallel);
            printf("0: %u H: %u\n", ut0_and_masks, utH_hamming);
        }
        b = 0xffffffffffffffff;
        ut0_and_masks = t0_and_masks(b);
        ut1_and_masks = t1_and_masks(b);
        ut3_rmlsbsub = t3_rmlsbsub(b);
        ut4_rmlsbmask = t4_rmlsbmask(b);
        ut5_testlsb = t5_testlsb(b);
        ut6_testmsb = t6_testmsb(b);
        ut7_testsignandshift = t7_testsignandshift(b);
        ut8_testeachbit = t8_testeachbit(b);
        ut9_testeachbit1shl = t9_testeachbit1shl(b);
        utA_tableshift = tA_tableshift(b);
        utB_tableuchar = tB_tableuchar(b);
        utC_tableshiftcast = tC_tableshiftcast(b);
        utD_itableshift = tD_itableshift(b);
        utE_itableuchar = tE_itableuchar(b);
        utF_itableshiftcast = tF_itableshiftcast(b);
        utG_parallel = tG_parallel(b);
        utH_hamming = tH_hamming(b);
        if ((ut0_and_masks != ut1_and_masks) ||
            (ut0_and_masks != ut3_rmlsbsub) ||
            (ut0_and_masks != ut4_rmlsbmask) ||
            (ut0_and_masks != ut5_testlsb) ||
            (ut0_and_masks != ut6_testmsb) ||
            (ut0_and_masks != ut7_testsignandshift) ||
            (ut0_and_masks != ut9_testeachbit1shl) ||
            (ut0_and_masks != utA_tableshift) ||
            (ut0_and_masks != utB_tableuchar) ||
            (ut0_and_masks != utC_tableshiftcast) ||
            (ut0_and_masks != utD_itableshift) ||
            (ut0_and_masks != utE_itableuchar) ||
            (ut0_and_masks != utF_itableshiftcast) ||
            (ut0_and_masks != utG_parallel) ||
            (ut0_and_masks != utH_hamming)
            ) {
            printf("0: %u 1: %u\n", ut0_and_masks, ut1_and_masks);
            printf("0: %u 3: %u\n", ut0_and_masks, ut3_rmlsbsub);
            printf("0: %u 4: %u\n", ut0_and_masks, ut4_rmlsbmask);
            printf("0: %u 5: %u\n", ut0_and_masks, ut5_testlsb);
            printf("0: %u 6: %u\n", ut0_and_masks, ut6_testmsb);
            printf("0: %u 7: %u\n", ut0_and_masks, ut7_testsignandshift);
            printf("0: %u 9: %u\n", ut0_and_masks, ut9_testeachbit1shl);
            printf("0: %u A: %u\n", ut0_and_masks, utA_tableshift);
            printf("0: %u B: %u\n", ut0_and_masks, utB_tableuchar);
            printf("0: %u C: %u\n", ut0_and_masks, utC_tableshiftcast);
            printf("0: %u D: %u\n", ut0_and_masks, utD_itableshift);
            printf("0: %u E: %u\n", ut0_and_masks, utE_itableuchar);
            printf("0: %u F: %u\n", ut0_and_masks, utF_itableshiftcast);
            printf("0: %u G: %u\n", ut0_and_masks, utG_parallel);
            printf("0: %u H: %u\n", ut0_and_masks, utH_hamming);
        }
        b = 1;
        ut0_and_masks = t0_and_masks(b);
        ut1_and_masks = t1_and_masks(b);
        ut3_rmlsbsub = t3_rmlsbsub(b);
        ut4_rmlsbmask = t4_rmlsbmask(b);
        ut5_testlsb = t5_testlsb(b);
        ut6_testmsb = t6_testmsb(b);
        ut7_testsignandshift = t7_testsignandshift(b);
        ut8_testeachbit = t8_testeachbit(b);
        ut9_testeachbit1shl = t9_testeachbit1shl(b);
        utA_tableshift = tA_tableshift(b);
        utB_tableuchar = tB_tableuchar(b);
        utC_tableshiftcast = tC_tableshiftcast(b);
        utD_itableshift = tD_itableshift(b);
        utE_itableuchar = tE_itableuchar(b);
        utF_itableshiftcast = tF_itableshiftcast(b);
        utG_parallel = tG_parallel(b);
        utH_hamming = tH_hamming(b);
        if ((ut0_and_masks != ut1_and_masks) ||
            (ut0_and_masks != ut3_rmlsbsub) ||
            (ut0_and_masks != ut4_rmlsbmask) ||
            (ut0_and_masks != ut5_testlsb) ||
            (ut0_and_masks != ut6_testmsb) ||
            (ut0_and_masks != ut7_testsignandshift) ||
            (ut0_and_masks != ut9_testeachbit1shl) ||
            (ut0_and_masks != utA_tableshift) ||
            (ut0_and_masks != utB_tableuchar) ||
            (ut0_and_masks != utC_tableshiftcast) ||
            (ut0_and_masks != utD_itableshift) ||
            (ut0_and_masks != utE_itableuchar) ||
            (ut0_and_masks != utF_itableshiftcast) ||
            (ut0_and_masks != utG_parallel) ||
            (ut0_and_masks != utH_hamming)
            ) {
            printf("0: %u 1: %u\n", ut0_and_masks, ut1_and_masks);
            printf("0: %u 3: %u\n", ut0_and_masks, ut3_rmlsbsub);
            printf("0: %u 4: %u\n", ut0_and_masks, ut4_rmlsbmask);
            printf("0: %u 5: %u\n", ut0_and_masks, ut5_testlsb);
            printf("0: %u 6: %u\n", ut0_and_masks, ut6_testmsb);
            printf("0: %u 7: %u\n", ut0_and_masks, ut7_testsignandshift);
            printf("0: %u 9: %u\n", ut0_and_masks, ut9_testeachbit1shl);
            printf("0: %u A: %u\n", ut0_and_masks, utA_tableshift);
            printf("0: %u B: %u\n", ut0_and_masks, utB_tableuchar);
            printf("0: %u C: %u\n", ut0_and_masks, utC_tableshiftcast);
            printf("0: %u D: %u\n", ut0_and_masks, utD_itableshift);
            printf("0: %u E: %u\n", ut0_and_masks, utE_itableuchar);
            printf("0: %u F: %u\n", ut0_and_masks, utF_itableshiftcast);
            printf("0: %u G: %u\n", ut0_and_masks, utG_parallel);
            printf("0: %u H: %u\n", ut0_and_masks, utH_hamming);
        }
    }

    return 0;
}
#endif

Dann Corbit
 

Re: GCC Inline asm problem

Postby Pradu » 19 Jul 2005, 22:39

Reinhard Scharnagl wrote:Hi Pradu,

the popCount() function is suboptimal - am I rigth ?

Reinhard.


I'm not sure Reinhard, but I do know it was good enough to be used in DarkThought.
Quoting Ernst A. Heinz:
"It performs better than intuitive methods with lookup tables because the tables get either too large or need too many lookups."

I can't find any explaination for the non-iterative popCount() function anywhere and am just going by Heinz's word right now, unless someone or some paper I come across can prove to me otherwise.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: GCC Inline asm problem

Postby Reinhard Scharnagl » 19 Jul 2005, 23:13

Hi Pradu,

How about:
Code: Select all
/* Trial (R. Scharnagl, first idea, not optimized) */
/* (P.S.: now unsigned used at several places)     */

#define msk1 0xEEEEEEEE
#define msk2 0xCCCCCCCC
#define msk3 0x88888888
#define msk4 0x0F0F0F0F

int popCount(const unsigned long long b)
{
  unsigned buf;
  int      acc;

  buf  = (unsigned)b;
  acc  = buf - ((buf & msk1)>>1)
             - ((buf & msk2)>>2)
             - ((buf & msk3)>>3);
  buf  = ((unsigned *)&b)[1];  /* Intel format */
  acc += buf - ((buf & msk1)>>1)
             - ((buf & msk2)>>2)
             - ((buf & msk3)>>3);
  acc = (acc & msk4) + ((acc >> 4) & msk4);
  acc = (acc & 0xFFFF) + (acc >> 16);
  acc = (acc & 0xFF) + (acc >> 8);
  return acc;
}

Reinhard.
P.S.:
Code: Select all
/* Trial (R. Scharnagl, first idea, not optimized) */
/*       (might be now an endian independent form) */

#define msk1 0xEEEEEEEE
#define msk2 0xCCCCCCCC
#define msk3 0x88888888
#define msk4 0x0F0F0F0F

int popCount(const unsigned long long b)
{
  unsigned buf;
  int      acc;

  buf  = ((unsigned *)&b)[0];
  acc  = buf - ((buf & msk1)>>1)
             - ((buf & msk2)>>2)
             - ((buf & msk3)>>3);
  buf  = ((unsigned *)&b)[1];
  acc += buf - ((buf & msk1)>>1)
             - ((buf & msk2)>>2)
             - ((buf & msk3)>>3);
  acc = (acc & msk4)   + ((acc >> 4) & msk4);
  acc = (acc & 0xFFFF) + (acc >> 16);
  acc = (acc & 0xFF)   + (acc >> 8);
  return acc;
}
Last edited by Reinhard Scharnagl on 19 Jul 2005, 23:43, edited 3 times in total.
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: GCC Inline asm problem

Postby Pradu » 19 Jul 2005, 23:20

All right let me test it and post some results in a little while.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: GCC Inline asm problem

Postby Pradu » 19 Jul 2005, 23:57

It isn't as good.

I hope my testing procedure is good enough

Test Program:
Code: Select all
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/timeb.h>

typedef unsigned long long U64;
#define rand64() (((U64)rand())^(((U64)rand())<<15)^(((U64)rand())<<30)^(((U64)rand())<<45)^(((U64)rand())<<60))
#define TABLE_SIZE 10000000
U64 randtable[TABLE_SIZE];

int getms()
{
   struct timeb timebuffer;
   ftime(&timebuffer);
   return (timebuffer.time * 1000) + timebuffer.millitm;
}

#define m1 0x5555555555555555ULL
#define m2 0x3333333333333333ULL
int popCount(const U64 b)
{
   int n;
   const U64 a = b - ((b >> 1) & m1);
   const U64 c = (a & m2) + ((a >> 2) & m2);
   n = ((int) c) + ((int) (c >> 32));
   n = (n & 0x0F0F0F0F) + ((n >> 4) & 0x0F0F0F0F);
   n = (n & 0xFFFF) + (n >> 16);
   n = (n & 0xFF) + (n >> 8);
   return n;
}


#define msk1 0xEEEEEEEE
#define msk2 0xCCCCCCCC
#define msk3 0x88888888
#define msk4 0x0F0F0F0F

int popCount2(const U64 b) /* P.S.: might be unsigned */
{
   int buf;
   int acc;

   buf  = (int)b;
   acc  = buf - ((buf & msk1)>>1)
           - ((buf & msk2)>>2)
           - ((buf & msk3)>>3);
   buf  = ((int *)&b)[1];  /* Intel format */
   acc += buf - ((buf & msk1)>>1)
           - ((buf & msk2)>>2)
           - ((buf & msk3)>>3);
   acc = (acc & msk4) + ((acc >> 4) & msk4);
   acc = (acc & 0xFFFF) + (acc >> 16);
   acc = (acc & 0xFF) + (acc >> 8);
   return acc;
}

int main(int argc, char *argv[])
{
   int i, time;
   time=getms();
   for(i=0;i<TABLE_SIZE;i++)
      randtable[i]=rand64();
   printf("Initialized rand Table: %d ms\n",getms()-time);
   time=getms();
   for(i=0;i<TABLE_SIZE;i++)
      popCount2(randtable[i]);
     printf("popCount2: %d ms\n",getms()-time);
     time=getms();
   for(i=0;i<TABLE_SIZE;i++)
      popCount(randtable[i]);
     printf("popCount: %d ms\n",getms()-time);
   system("PAUSE");
   return 0;
}


Results on Pentium II 448 Mhz:

Code: Select all
Initialized rand Table: 7150 ms
popCount2: 370 ms
popCount: 51 ms


with popCount2 as:
Code: Select all
int popCount2(const U64 b) /* P.S.: might be unsigned */
{
   unsigned buf;
   int acc;

   buf  = (int)b;
   acc  = buf - ((buf & msk1)>>1)
           - ((buf & msk2)>>2)
           - ((buf & msk3)>>3);
   buf  = ((int *)&b)[1];  /* Intel format */
   acc += buf - ((buf & msk1)>>1)
           - ((buf & msk2)>>2)
           - ((buf & msk3)>>3);
   acc = (acc & msk4) + ((acc >> 4) & msk4);
   acc = (acc & 0xFFFF) + (acc >> 16);
   acc = (acc & 0xFF) + (acc >> 8);
   return acc;
}


Results are:
Code: Select all
Initialized rand Table: 6709 ms
popCount2: 321 ms
popCount: 60 ms
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: GCC Inline asm problem

Postby Reinhard Scharnagl » 20 Jul 2005, 00:02

Pradu,

thank you for testing,
could be wrong defined constants, try:

Code: Select all
#define msk1 0xEEEEEEEEUL
#define msk2 0xCCCCCCCCUL
#define msk3 0x88888888UL
#define msk4 0x0F0F0F0FUL


Reinhard.

P.S. in debug mode I have got

Initialized rand Table: 2594 ms
popCount2: 1922 ms
popCount: 2609 ms
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: GCC Inline asm problem

Postby Pradu » 20 Jul 2005, 00:16

With the unsigned constants the results havnen't changed much:

Code: Select all
Initialized rand Table: 6660 ms
popCount2: 330 ms
popCount: 50 ms

Initialized rand Table: 6600 ms
popCount2: 320 ms
popCount: 60 ms

Initialized rand Table: 7230 ms
popCount2: 321 ms
popCount: 50 ms


P.S. in debug mode I have got

Initialized rand Table: 2594 ms
popCount2: 1922 ms
popCount: 2609 ms


P.S. in debug mode I have got

Initialized rand Table: 2594 ms
popCount2: 1922 ms
popCount: 2609 ms

Quite odd -- I'm using GCC compiler with optimizations. I wonder why the popCount takes much longer in your compile. Could you post your source and exe on a server or email it to me.
Last edited by Pradu on 20 Jul 2005, 00:22, edited 1 time in total.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: GCC Inline asm problem

Postby Reinhard Scharnagl » 20 Jul 2005, 00:19

Pradu,

in release mode I get only zero times ... ?

Reinhard.
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: GCC Inline asm problem

Postby Pradu » 20 Jul 2005, 00:27

Reinhard Scharnagl wrote:Pradu,

in release mode I get only zero times ... ?

Reinhard.


I'm not sure how that happened. What compiler do you use and what processor?

This is what I used to compile:
Code: Select all
gcc.exe main.o  -o "PopCountTest.exe" -L"F:/Dev-Cpp/lib"  -lobjc
Last edited by Pradu on 20 Jul 2005, 01:03, edited 1 time in total.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: GCC Inline asm problem

Postby Dann Corbit » 20 Jul 2005, 00:29

An important thing to think about is that almost all the time your bitboards will be very sparse. Even a full board with no chessmen removed is only half full of bits.

Your random function creates dense bitboards.

It would be good to experiment with both kinds.
Dann Corbit
 

Re: GCC Inline asm problem

Postby Reinhard Scharnagl » 20 Jul 2005, 00:33

Pradu,

it has been an effect of the optimizer in VStudio, thus I changed the test somehow:
Code: Select all
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/timeb.h>

typedef unsigned long long U64;
#define rand64() (((U64)rand())^(((U64)rand())<<15)^(((U64)rand())<<30)^(((U64)rand())<<45)^(((U64)rand())<<60))
#define TABLE_SIZE 10000000
U64 randtable[TABLE_SIZE];

int getms()
{
   struct timeb timebuffer;
   ftime(&timebuffer);
   return (timebuffer.time * 1000) + timebuffer.millitm;
}

#define m1 0x5555555555555555ULL
#define m2 0x3333333333333333ULL
int popCount(const U64 b)
{
   int n;
   const U64 a = b - ((b >> 1) & m1);
   const U64 c = (a & m2) + ((a >> 2) & m2);
   n = ((int) c) + ((int) (c >> 32));
   n = (n & 0x0F0F0F0F) + ((n >> 4) & 0x0F0F0F0F);
   n = (n & 0xFFFF) + (n >> 16);
   n = (n & 0xFF) + (n >> 8);
   return n;
}

/* Trial (R. Scharnagl, first idea, not optimized) */
/*       (might be now an endian independent form) */

#define msk1 0xEEEEEEEEUL
#define msk2 0xCCCCCCCCUL
#define msk3 0x88888888UL
#define msk4 0x0F0F0F0FUL

int popCount2(const U64 b)
{
  unsigned buf;
  int      acc;

  buf  = (unsigned)b;
  acc  = buf - ((buf & msk1)>>1)
             - ((buf & msk2)>>2)
             - ((buf & msk3)>>3);
  buf  = (unsigned)(b>>32);
  acc += buf - ((buf & msk1)>>1)
             - ((buf & msk2)>>2)
             - ((buf & msk3)>>3);
  acc = (acc & msk4)   + ((acc >> 4) & msk4);
  acc = (acc & 0xFFFF) + (acc >> 16);
  acc = (acc & 0xFF)   + (acc >> 8);
  return acc;
}

int main(int argc, char *argv[])
{
   int i, loop, sum;
   volatile int time=getms();
   for(i=0;i<TABLE_SIZE;i++)
      randtable[i]=rand64();
   printf("Initialized rand Table: %d ms\n",getms()-time);
   time=getms();
   sum= 0;
   for(loop = 25; --loop >=0; )
    for(i=0;i<TABLE_SIZE;i++)
      sum += popCount2(randtable[i]);
     printf("popCount2: %d ms (%d)\n",getms()-time, sum);
     time=getms();
   sum= 0;
   for(loop = 25; --loop >=0; )
    for(i=0;i<TABLE_SIZE;i++)
      sum += popCount(randtable[i]);
     printf("popCount: %d ms (%d)\n",getms()-time, sum);
   system("PAUSE");
   return 0;
}

Code: Select all
Initialized rand Table: 1156 ms
popCount2: 2797 ms (-589497692)
popCount: 4312 ms (-589497692)

Reinhard
Reinhard Scharnagl
 
Posts: 608
Joined: 01 Oct 2004, 08:36
Location: Klein-Gerau, Germany

Re: GCC Inline asm problem

Postby Pradu » 20 Jul 2005, 00:44

popCount2 seems to be faster now:
Code: Select all
Initialized rand Table: 6539 ms
popCount2: 25967 ms (-589506992)
popCount: 26619 ms (-589506992)


The addition operation seems to take up a lot of processor time:

Code: Select all
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/timeb.h>

typedef unsigned long long U64;
#define rand64() (((U64)rand())^(((U64)rand())<<15)^(((U64)rand())<<30)^(((U64)rand())<<45)^(((U64)rand())<<60))
#define TABLE_SIZE 10000000
U64 randtable[TABLE_SIZE];

int getms()
{
   struct timeb timebuffer;
   ftime(&timebuffer);
   return (timebuffer.time * 1000) + timebuffer.millitm;
}

#define m1 0x5555555555555555ULL
#define m2 0x3333333333333333ULL
int popCount(const U64 b)
{
   int n;
   const U64 a = b - ((b >> 1) & m1);
   const U64 c = (a & m2) + ((a >> 2) & m2);
   n = ((int) c) + ((int) (c >> 32));
   n = (n & 0x0F0F0F0F) + ((n >> 4) & 0x0F0F0F0F);
   n = (n & 0xFFFF) + (n >> 16);
   n = (n & 0xFF) + (n >> 8);
   return n;
}

/* Trial (R. Scharnagl, first idea, not optimized) */
/*       (might be now an endian independent form) */

#define msk1 0xEEEEEEEEUL
#define msk2 0xCCCCCCCCUL
#define msk3 0x88888888UL
#define msk4 0x0F0F0F0FUL

int popCount2(const U64 b)
{
  unsigned buf;
  int      acc;

  buf  = (unsigned)b;
  acc  = buf - ((buf & msk1)>>1)
             - ((buf & msk2)>>2)
             - ((buf & msk3)>>3);
  buf  = (unsigned)(b>>32);
  acc += buf - ((buf & msk1)>>1)
             - ((buf & msk2)>>2)
             - ((buf & msk3)>>3);
  acc = (acc & msk4)   + ((acc >> 4) & msk4);
  acc = (acc & 0xFFFF) + (acc >> 16);
  acc = (acc & 0xFF)   + (acc >> 8);
  return acc;
}

int main(int argc, char *argv[])
{
   int i, loop, sum;
   volatile int time=getms();
   for(i=0;i<TABLE_SIZE;i++)
      randtable[i]=rand64();
   printf("Initialized rand Table: %d ms\n",getms()-time);
   time=getms();
   sum= 0;
   for(loop = 25; --loop >=0; )
    for(i=0;i<TABLE_SIZE;i++)
      /*sum += */popCount2(randtable[i]);
     printf("popCount2: %d ms\n",getms()-time/*, sum*/);
     time=getms();
   sum= 0;
   for(loop = 25; --loop >=0; )
    for(i=0;i<TABLE_SIZE;i++)
      /*sum += */popCount(randtable[i]);
     printf("popCount: %d ms\n",getms()-time/*, sum*/);
   system("PAUSE");
   return 0;
}


Code: Select all
Initialized rand Table: 6690 ms
popCount2: 1322 ms
popCount: 1302 ms


The popCount2 seems to be as good as popCount with GCC, I'll have to try to compile it with Visual Studio 2003 C compiler when I get back to my dorm room with my much faster P4 2.8 Ghz computer :-).

Thanks Reinhard.
Last edited by Pradu on 20 Jul 2005, 00:56, edited 3 times in total.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Next

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 30 guests