My lock and unlock function. Help~~~

firemars · April 28, 2010, 10:03pm

I am trying to implemented a lock and unlock function in CUDA.
The principle is simple. I put a global variable “lockVal” in the device memory. Its initial value is 0.
If one thread wants to enter the critical section, it has to read 0 from lockVal and change it to 1 at the same time. Otherwise, it will enter to the loop.
device inline void lock(int *lockVal)
{
int tmp0=0;
int tmp1;
int val=1;
while((tmp1=atomicCAS(lockVal,tmp0,val))!=tmp0);
}

device inline void unlock(int *lockVal)
{
int tmp0=1;
int tmp1;
int val=0;
while((tmp1=atomicCAS(lockVal,tmp0,val))!=tmp0);
}
But it cannot work. It seams that no threads can enter to the critical section.

Can anyone help me figure out the reason of the failure? It makes me crazy.

firemars · April 28, 2010, 10:20pm

I am trying to implemented a lock and unlock function in CUDA.

The principle is simple. I put a global variable “lockVal” in the device memory. Its initial value is 0.

If one thread wants to enter the critical section, it has to read 0 from lockVal and change it to 1 at the same time. Otherwise, it will enter to the loop.

device inline void lock(int *lockVal)

{
int tmp0=0;

int tmp1;

int val=1;

while((tmp1=atomicCAS(lockVal,tmp0,val))!=tmp0);
}

device inline void unlock(int *lockVal)

{
int tmp0=1;

int tmp1;

int val=0;

while((tmp1=atomicCAS(lockVal,tmp0,val))!=tmp0);
}

But it cannot work. It seams that no threads can enter to the critical section.

Can anyone help me figure out the reason of the failure? It makes me crazy.

Run this application on multiple blocks, but each block just has one thread. It works fine.

I think the problem may be in the schedule of warp.

firemars · April 28, 2010, 10:22pm

Run this application on multiple blocks, but each block just has one thread. It works fine.
I think the problem may be in the schedule of warp.

SPWorley · April 30, 2010, 3:42am

Correct. When one thread grabs your lock, that thread will be temporarily disabled as the remaining 31 warp threads keep cycling try to reach that same “success!” instruction so the warp can be reconverged. But that lone disabled thread is holding onto your lock, so the remaining 31 threads will never succeed. Boom, you shot yourself in the foot.

Locks are tricky even on the CPU… on the GPU they’re even more complicated. As tmurray always will tell us, don’t go there, don’t do it, it’s not worth it, you’ll hurt yourself.

indy2718 · April 30, 2010, 3:47am

What you are trying to do is make a spinlock:
[url=“Spinlock - Wikipedia”]Spinlock - Wikipedia

SPWorley · April 30, 2010, 3:58am

Another useful thread: http://forums.nvidia.com/index.php?showtopic=98444

I’m almost regretting helping you (read again my warning!) but you may have more luck with these hacks by inlining your atomic operation inside the lock acquisition and check.

Something like:

{

	// assume *lock has been globally initialized to 1. Any thread which can "grab" this value owns the lock and must return it.   

bool needToDoWork=true;

while (needToDoWork) {

	 if (atomicExch(lock, 0)) {

	   /* Lucky winner! I got the lock! */

	   // do my work here......

	   atomicExch(lock, 1); // return the lock

	   needToDoWork=false;

	  }

}

Caveat: I have not tried the above code… I am just showing you a form I used successfully. It’s still evil.

Topic		Replies	Views
why this deadlocks? try to invoke a critical area CUDA Programming and Performance	11	6078	November 6, 2009
Problem with lock using atomicCAS CUDA Programming and Performance	3	3496	July 19, 2014
Try to use lock and unlock in CUDA CUDA Programming and Performance	1	18845	June 14, 2017
Atomic Operations in CUDA CUDA Programming and Performance	5	29215	June 9, 2009
Understanding a spinlock implementation by Robert Crovella CUDA Programming and Performance	6	1314	September 26, 2023
atomicCAS for mutiple blocks & mutiple threads - CUDA 3.2 - Fedora 10 CUDA Programming and Performance	7	2485	April 25, 2011
Weird behavior of atomic operations on Ampere architecture GPUs CUDA Programming and Performance cuda	9	1380	September 10, 2021
critical section problem critical section problem using atomicCAS CUDA Programming and Performance	5	9096	April 27, 2009
atomicCAS does NOT seem to work Hardware Bug? or Improper use?? TESLA C1060 CUDA Programming and Performance	70	19733	January 21, 2010
Implementing global lock (critical section) on 1-block cuda process CUDA Programming and Performance cuda , kernel	1	145	June 4, 2024

My lock and unlock function. Help~~~

Related topics