TryLock for GCC

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

TryLock for GCC

Postby Pradu » 20 Jul 2007, 21:51

I've been able to implement a TryLock using adapted code from Crafty's "lock.h" for Microsoft compatible compiler's:

Code: Select all
//Adapted from Crafty's "lock.h"

typedef volatile unsigned long SpinLock[1];
typedef volatile unsigned long* SpinLock_P;

#define InitSpinLock(s) ((s)[0] = 0)
#define ReleaseSL(s)    ((s)[0] = 0)

#if defined(_MSC_VER)
   #pragma intrinsic (_InterlockedExchange)
   static __forceinline void LockSL(SpinLock_P s)
   {
     long v;
     for(;;)
     {
      v = _InterlockedExchange(s, 1);
      if(!v) return;
      while(*s);
     }
   }
   #define TryLockSL(s) (!_InterlockedExchange(s, 1))
#endif


How would I modify the GCC Assembly code to do a TryLock?

Code: Select all
static void __inline__ LockX86(volatile int *lock)
{
  int dummy;
  asm __volatile__("1:          movl    $1, %0" "\n\t"
      "            xchgl   (%1), %0" "\n\t" "            testl   %0, %0" "\n\t"
      "            jz      3f" "\n\t" "2:          pause" "\n\t"
      "            movl    (%1), %0" "\n\t" "            testl   %0, %0" "\n\t"
      "            jnz     2b" "\n\t" "            jmp     1b" "\n\t" "3:"
      "\n\t":"=&q"(dummy)
      :"q"(lock)
      :"cc");
}


Also a quick question, the releasing code can be written as *lock=0 right?
Code: Select all
static void __inline__ UnlockX86(volatile int *lock)
{
  int dummy;
  asm __volatile__("movl    $0, (%1)":"=&q"(dummy)
      :"q"(lock));
}


Also GCC assembly routines for _InterlockedIncrement and _InterlockedDecrement operating on 32-bit signed integers would be appreciated.

GCC doesn't seem to have intrinsics as far as google searching goes, is this correct?
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: TryLock for GCC

Postby bob » 21 Jul 2007, 02:24

Pradu wrote:I've been able to implement a TryLock using adapted code from Crafty's "lock.h" for Microsoft compatible compiler's:

Code: Select all
//Adapted from Crafty's "lock.h"

typedef volatile unsigned long SpinLock[1];
typedef volatile unsigned long* SpinLock_P;

#define InitSpinLock(s) ((s)[0] = 0)
#define ReleaseSL(s)    ((s)[0] = 0)

#if defined(_MSC_VER)
   #pragma intrinsic (_InterlockedExchange)
   static __forceinline void LockSL(SpinLock_P s)
   {
     long v;
     for(;;)
     {
      v = _InterlockedExchange(s, 1);
      if(!v) return;
      while(*s);
     }
   }
   #define TryLockSL(s) (!_InterlockedExchange(s, 1))
#endif


How would I modify the GCC Assembly code to do a TryLock?

Code: Select all
static void __inline__ LockX86(volatile int *lock)
{
  int dummy;
  asm __volatile__("1:          movl    $1, %0" "\n\t"
      "            xchgl   (%1), %0" "\n\t" "            testl   %0, %0" "\n\t"
      "            jz      3f" "\n\t" "2:          pause" "\n\t"
      "            movl    (%1), %0" "\n\t" "            testl   %0, %0" "\n\t"
      "            jnz     2b" "\n\t" "            jmp     1b" "\n\t" "3:"
      "\n\t":"=&q"(dummy)
      :"q"(lock)
      :"cc");
}


Also a quick question, the releasing code can be written as *lock=0 right?
Code: Select all
static void __inline__ UnlockX86(volatile int *lock)
{
  int dummy;
  asm __volatile__("movl    $0, (%1)":"=&q"(dummy)
      :"q"(lock));
}


Also GCC assembly routines for _InterlockedIncrement and _InterlockedDecrement operating on 32-bit signed integers would be appreciated.

GCC doesn't seem to have intrinsics as far as google searching goes, is this correct?


one worry. The compiler can move code around so you must force it to do the lock/unlock where you intend. The "volatile" keyword on a function does that IIRC. Inline asm or not really doesn't matter so long as the shadow-lock approach is used to avoid the locked exchange being repeated so much it saturates the bus and kills performance.

But avoid giving the compiler the freedom to move the lock/unlock stuff to wherever it seems to fit best for speed, as that can wreck you before you get started and it is _hard_ to find. From experience... :)
User avatar
bob
 
Posts: 156
Joined: 10 May 2006, 17:59

Re: TryLock for GCC

Postby Pradu » 22 Jul 2007, 12:31

bob wrote:one worry. The compiler can move code around so you must force it to do the lock/unlock where you intend. The "volatile" keyword on a function does that IIRC. Inline asm or not really doesn't matter so long as the shadow-lock approach is used to avoid the locked exchange being repeated so much it saturates the bus and kills performance.

But avoid giving the compiler the freedom to move the lock/unlock stuff to wherever it seems to fit best for speed, as that can wreck you before you get started and it is _hard_ to find. From experience... :)
Thanks I hope this is correct:

Code: Select all
/************************ INTERLOCKED OPERATIONS ************************/

#if defined(_MSC_VER)
   #pragma intrinsic (_InterlockedIncrement)
   #define LockedIncrement(ptr_int32) _InterlockedIncrement(ptr_int32)
   #pragma intrinsic (_InterlockedDecrement)
   #define LockedDecrement(ptr_int32) _InterlockedDecrement(ptr_int32)
   #pragma intrinsic (_InterlockedExchange)
   #define LockedExchange(ptr_int32,value_int32) _InterlockedExchange(ptr_int32,value_int32)
#endif

/************************ SPIN LOCKS ************************/

typedef volatile long SpinLock[1];
typedef volatile long* SpinLock_P;

#define ResetSpinLock(s) ReleaseSL(s)
static INLINE void volatile ReleaseSL(SpinLock_P s) {s[0]=0;}
static INLINE long volatile TryLockSL(SpinLock_P s) {return !LockedExchange(s,1);}
static INLINE void volatile LockSL(SpinLock_P s) {while(s[0]){if(TryLockSL(s))return;}}


Update:
I found GCC inline implementations of these in the ReactOS source code:

Code: Select all
static __inline__ __attribute__((always_inline)) long _InterlockedExchange(volatile long * const Target, const long Value)
{
   long retval = Value;
   __asm__("xchgl %[retval], %[Target]" : [retval] "+r" (retval) : [Target] "m" (*Target) : "memory");
   return retval;
}


static __inline__ __attribute__((always_inline)) long _InterlockedExchangeAdd(volatile long * const Addend, const long Value)
{
   long retval = Value;
   __asm__("lock; xaddl %[retval], %[Addend]" : [retval] "+r" (retval) : [Addend] "m" (*Addend) : "memory");
   return retval;
}

static  __inline__ __attribute__((always_inline)) unsigned char _BitScanForward(unsigned long * const Index, const unsigned long Mask)
{
   __asm__("bsfl %[Mask], %[Index]" : [Index] "=r" (*Index) : [Mask] "mr" (Mask));
   return Mask ? 1 : 0;
}

static  __inline__ __attribute__((always_inline)) unsigned char _BitScanReverse(unsigned long * const Index, const unsigned long Mask)
{
   __asm__("bsrl %[Mask], %[Index]" : [Index] "=r" (*Index) : [Mask] "mr" (Mask));
   return Mask ? 1 : 0;
}



So I guess for any processor instruction you need for GCC, you can use the above assembly as a template.

EDIT again:
It seems after GCC versions >= 4.1 have compiler intrinsics for atomic memory access so it is possible to do this even without inline assembly! They seem to be even easier to use than MSVC intrinsics because the intrinsic is overloaded for every type --
http://tulip.umcs.maine.edu/~shamis/dow ... /gcc-1.pdf (page 289)

Code: Select all
#ifdef _MSC_VER
   #include <intrin.h>
#endif

#ifndef INLINE
   #ifdef _MSC_VER
      #define INLINE __forceinline
   #elif defined(__GNUC__)
      #define INLINE __inline__ __attribute__((always_inline))
   #else
      #define INLINE inline
   #endif
#endif

/************************ SPIN LOCKS ************************/
typedef volatile int SpinLock[1];
typedef volatile int* SpinLock_P;

#if defined(_MSC_VER)
   #pragma intrinsic (_InterlockedExchange)
   #define LockedExchange(Target,Value) _InterlockedExchange(Target,Value)
#elif defined(__GNUC__)
   #if(100*__GNUC__+__GNUC_MINOR__)>=401
      static INLINE int volatile LockedExchange(SpinLock_P const Target, const int Value)
      {__sync_synchronize(); return __sync_lock_test_and_set(Target, Value);}
   #else
      static INLINE int volatile LockedExchange(SpinLock_P const Target, const int Value)
      {
         int retval = Value;
         __asm__
         (
            "xchgl %[retval], %[Target]"
            : [retval] "+r" (retval)
            : [Target] "m" (*Target)
            : "memory"
         );
         return retval;
      }
   #endif
#else
   #error Unspported Compiler
#endif

#define ResetSpinLock(s) ReleaseSL(s)
static INLINE void volatile ReleaseSL(SpinLock_P s) {s[0]=0;}
static INLINE int volatile TryLockSL(SpinLock_P s) {return !LockedExchange(s,1);}
static INLINE void volatile LockSL(SpinLock_P s) {while(s[0] || LockedExchange(s,1));}


Code: Select all
P42.8 Ghz MSVC2005

#pragma optimize("", off)
void test(int a, int b)
{
   int i,j;
   int ret=0;
   time_ms t= getms();
   for(i=0;i<a;i++)
      for(j=0;j<b;j++)
      {
         ret+=a+b;
      }
   t = getms()-t;
   print("(%u ms)\n",t);
}
#pragma optimize("", on)
test(100000,10000); (2672 ms)


#pragma optimize("", off)
void test(int a, int b)
{
   int i,j;
   SpinLock s;
   time_ms t= getms();
   ResetSpinLock(s);
   for(i=0;i<a;i++)
      for(j=0;j<b;j++)
      {
         Lock(s);
         Release(s);
      }
   t = getms()-t;
   print("(%u ms)\n",t);
}
#pragma optimize("", on)
test(10000,10000); (5313 ms)


So it seems that Locking and Releasing even with SpinLocks is an order 10 of magnitude worse than a few normal operations.

Edit again (found conflict with Microsoft's interpretation of long (32-bits) and GGC's interpretation of long (64-bits))
Code: Select all
#ifdef _MSC_VER
   #include <intrin.h>
#endif

#ifndef INLINE
   #ifdef _MSC_VER
      #define INLINE __forceinline
   #elif defined(__GNUC__)
      #define INLINE __inline__ __attribute__((always_inline))
   #else
      #define INLINE inline
   #endif
#endif

#if defined(__GNUC__)
   typedef volatile int SpinLock[1];
   typedef volatile int* const SpinLock_P;
   static INLINE int volatile LockedExchange(SpinLock_P Target, const int Value)
   {
      int ret = Value;
      __asm__
      (
         "xchgl %[ret], %[Target]"
         : [ret] "+r" (ret)
         : [Target] "m" (*Target)
         : "memory"
      );
      return ret;
   }
#elif defined(_MSC_VER)
   typedef volatile long SpinLock[1];
   typedef volatile long* const SpinLock_P;
   #include <intrin.h>
   #pragma intrinsic (_InterlockedExchange)
   #define LockedExchange(Target,Value) _InterlockedExchange(Target,Value)
#else
   #error Unspported Compiler
#endif

#define IsLocked(s) ((s)[0])
#define SetLocked(s,boolean) ((s)[0]=(boolean))
#define ResetSpinLock(s) SetLocked(s,0)
static INLINE void volatile Release(SpinLock_P s) {SetLocked(s,0);}
static INLINE int volatile TryLock(SpinLock_P s) {return !(IsLocked(s) || LockedExchange(s,1));}
static INLINE void volatile Lock(SpinLock_P s) {while(IsLocked(s) || LockedExchange(s,1));}
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: TryLock for GCC

Postby Scott Gasch » 24 Aug 2007, 06:52

Pradu wrote:I've been able to implement a TryLock using adapted code from Crafty's "lock.h" for Microsoft compatible compiler's:


If you're using MS compilers, why don't you just call InterlockedCompareExchange instead of trying to make your own? Or did I misunderstand?

Also a quick question, the releasing code can be written as *lock=0 right?


Let me preface this by saying my knowledge of the x86 memory model is not 100% solid. But I do have some experience in this area.

Whether releasing the lock this way is safe depends on whether every time you read the lock you do so with a LOCK assembly language prefix. InterlockedCompareExchange on win32 is implemented with a LOCK CMPXCHG instruction which causes a full memory fence. This means that the memory "load" is forced to see the result of all previous stores and cannot be reordered with respect to them.

But if you have code lurking around that looks like this:

Code: Select all
assert(*lock == 1); // we must hold the lock here!


or...

Code: Select all
if (*lock != 1) {
  AcquireSpinLock(lock);
}


Then nasty bugs are possible. This is because on the x86 memory model loads can be reordered with respect to other loads and stores in the absence of a memory fence. So your "load" (i.e. read the lock state) can be reordered before the previous "store" (i.e. release the lock state via *lock = 0). Moreover if these operations are happening on two different threads there is nothing that forces the "loader" thread to see the change written by the "storer" (i.e. releasing the lock) thread. Unless you are "loading" with LOCK semantics.

So the answer to your question is this: you can release the lock that way if you are always careful how you read the lock. Or you can release it like this:

Code: Select all
InterlockedDecrement(lock);


...and not worry about it.

See also http://wannabe.guru.org/svn/typhoon/trunk/x86.asm for some nasm versions of these routines you're looking for.
Scott Gasch
 
Posts: 26
Joined: 04 Oct 2005, 04:04
Location: Kirkland, WA

Re: TryLock for GCC

Postby bob » 24 Aug 2007, 17:30

L clear locks by simply saying "lock = 0" in an Unlock() function. But since that function is declared as volatile, I avoid the problem of the compiler moving my code around, which can move the lock=0 up far enough to produce bugs...

The out of order writes is yet another issue as you mentioned.
User avatar
bob
 
Posts: 156
Joined: 10 May 2006, 17:59

Re: TryLock for GCC

Postby Dann Corbit » 28 Aug 2007, 00:56

The PostgreSQL source code base also has highly portable SMP primitives with Berkeley license.

See (for example) s_lock.h
Dann Corbit
 


Return to Programming and Technical Discussions

Who is online

Users browsing this forum: No registered users and 11 guests

cron