Hi,
From this post (Try to use lock and unlock in CUDA), I was able to locate a stackoverflow answer (Cuda atomics change flag - Stack Overflow) by Robert Crovella that provides an example implementation of a spinlock using atomic compare and swap. The code is as follows:
__syncthreads();
if (threadIdx.x == 0)
acquire_semaphore(&sem);
__syncthreads();
//begin critical section
// ... your critical section code goes here
//end critical section
__threadfence(); // not strictly necessary for the lock, but to make any global updates in the critical section visible to other threads in the grid
__syncthreads();
if (threadIdx.x == 0)
release_semaphore(&sem);
__syncthreads();
where the helper functions are :
__device__ volatile int sem = 0;
__device__ void acquire_semaphore(volatile int *lock){
while (atomicCAS((int *)lock, 0, 1) != 0);
}
__device__ void release_semaphore(volatile int *lock){
*lock = 0;
__threadfence();
}
This code is already very helpful, but I was hoping to make sure I really understand it correctly. First of all, in the first code block, there are 4 __syncthreads() and 1 __threadfence() calls. The __threadfence() after the critical section makes sense, since CUDA has a relaxed memory model, we’d need a __threadfence() to enforce the writes before that are visible to other blocks. However, for the 4 __syncthreads(), I think I understand only two of them, and my thoughts are as follows:
- we need the second __syncthreads() because our master thread (threadIdx=0) may not succefully got the lock, and we don’t want other warps to proceed (into the critical section before threadIdx=0 has acquired the lock
- we need the third __syncthreads() because we want make sure all warps have reached here (finished doing critical stuff) before the master thread attempts to release the unlock. If some warps still have not finished and the lock is released, mutual exclusion is violated
Basically I do see the need for the second and the third __syncthreads() to ensure mutual exclusion, but I don’t quite understand why we need the first and the last __syncthreads? Thanks!
For example, without the first __syncthreads(), the master thread could already acquired the lock, and other warps are still not reaching the if
yet, but this seems fine to me? because it doesn’t violate mutual exclusion.