How to implement lock on the gpu?

Is there someone do anything about the lock on gpu? In my program, I want to use some threads in one thread block, and these threads will contact with each other threads through the shared memory, i.e., threads in one thread block may write or read the same space, my question is : Is there CUDA APIs work as pthread_mutex_lock that can lock some code to guarantee the exclusive access to the same code piece? Or some other alternative methods to solve this situation?
I has read the atomic functions supplied by Nvidia, but the shared data is not just one int or other primitive type variable, it is a struct that contains some fields.
Assume there is a struct named ExampleStruct defined like this:

struct ExampleStruct {
int first;
int second;
int third;

ExampleStruct example;
using pthread_mutex_lock would like this:

example.first += 1;
example.second += 1;
example.third += 1;

but the atomic functions can only assure that one variable (a memory space) access exclusively.

code like the following is not right:

Any ideas,guys? :unsure:

Implementation of locks is non-trivial in CUDA.

The general idea is to

  1. Make 1 representative thread inside a block contend for a block-level lock in global memory
  2. Once a block gets the lock, threads inside the block should contend for a intra-block lock present in shared memory
  3. Do critical section and Release Shared Memory lock and global memory lock and then (all threads in the block) go to step 1.

It is common to see that “spinning while loop” deadlocks due to WARP-Divergence. So, the spinning loops need to be written VERY carefully. Check out the thread.

You can check the histogram example SDK, it has a quite elegant implementation of software lock for shared memory access

OK, Thanks very much, guy! I will check it.

Sarnath, thank you for you tips, and I also read the topic thread, It’s so long a discussion, but still a bitter confused, now I come back to this topic again(last week I focused my attention on the other thing), and I plan to read the topic again. Hope I can get it.

Thanks again!


I suggest not to read it completely. SOrry I should have told you b4. The topic takes lot of deviations and one can easily get confused…

Try implementing the lock according to what I said – like, get 1 representative thread (threadIdx.x == 0) fight for the block, then among threads inside the block…

While you do that you might hit “deadlocks”… Read the topic (browse it peripherally to locate your region of interest)… Most of the deadlocks come from “spinning” loops…

Good Luck,