Hi,
I have tried to use spinlocks within my kernel code. Similary as in this article, I have used this code:
void lock(__global int *the_lock)
{
int occupied = atom_xchg(the_lock, 1);
while (occupied) {
occupied = atom_xchg(the_lock, 1);
}
}
void unlock(__global int *the_lock)
{
int occupied = atom_xchg(the_lock, 0);
}
and initialized the spinlock to 0, issued a barrier and simply locked and unlocked inside the kernel. The code works as it should when ran with both global and local workgroup size 1, but with 2 it simply hangs as if the kernel looped forever. If I tried to lock and unlock the spinlock inside condition get_local_id() == 0 and then get_local_id() == 1, it works. With spinlock in local memory the problem repeats, adding some memory fences where it is possible has not helped.
After some time I have realized what’s the problem: The first work-item succesfully locks the spinlock, the second finds it locked (so far everything is OK). However, because of the SIMD fashion of instructions execution, the cycle executed by second work-item blocks also the first work-item (because there is no scheduling as on CPU and all work-items execute the same instructions). This causes the first work-item to never unlock the spinlock and therefore it hangs.
Is this explanation correct? How should I implement the spinlock correctly? I tried to rewrite it this way:
for(;<img src='http://forums.nvidia.com/public/style_emoticons/<#EMO_DIR#>/wink.gif' class='bbc_emoticon' alt=';)' /> {
if (atom_xchg(the_lock, 1) == 0) {
// critical section
atom_xchg(the_lock, 0);
break;
}
}
so the unlock code would be inside the if but it does not work either (god knows what the compiler does with that). What should I do?
To provide enough information: I am running 64-bit Red Hat 6.0 Linux with Nvidia GeForce GTX 580 with driver version 260.19.26 (as what some OpenCL info utility reports).