Try to use lock and unlock in CUDA

EternalSaga · June 13, 2017, 11:26pm

I try to use atomicCAS and atomicExch to simulate lock and unlock functions in troditional thread and block concurrcy programming. But I found some strange problems.
Here is my code.
The lock only works between thread block but not threads. It seems will cause dead lock between threads.

__global__ void lockAdd(int*val, int* mutex) {
	while (0 != (atomicCAS(mutex, 0, 1))) {}//Similar to spin lock
	(*val)++;//all threads add one to the value
	atomicExch(mutex, 0);//unlock
}

int main() {
	int* mutex;//all threads share on mutex.
	cudaMallocManaged((void**)&mutex, sizeof(int));
	*mutex = 0;
	int* val;
	cudaMallocManaged((void**)&val, sizeof(int));//val is on unified memory
	*val = 0;
	lockAdd << <1024, 1024 >> > (val,mutex);//1024 blocks,1024 threads per block
	//lockAdd << <1024, 1 >> > (val,mutex);//If I only launch 1 thread per block, it works perfectly
	cudaDeviceSynchronize();
	std::cout << *val << std::endl;//the idea output should be 1 million. But it seems there is a dead lock and the driver is crashed.
	cudaFree(val);
	cudaFree(mutex);
}

Robert_Crovella · June 14, 2017, 12:06am

Yes, others have run into this.

The best suggestion is to not use a mutex at all. For example, many algorithms that depend on a mutex can be recrafted to use a parallel reduction methodology.

A better suggestion than thread-level locking is to arrange locking at the threadblock level, and then negotiate for access within a threadblock using ordinary synchronization means:

[url]Cuda atomics change flag - Stack Overflow

Threads within a warp negotiating for a lock can be quite challenging due to the GPU warp-based execution:

[url]synchronization - Implementing a critical section in CUDA - Stack Overflow
[url]Cuda atomics change flag - Stack Overflow
[url]Cuda Mutex, why deadlock? - Stack Overflow

Topic		Replies	Views
Implementing mutual exclusion lock using atomicCAS() CUDA Programming and Performance	2	2429	August 5, 2009
Problem with lock using atomicCAS CUDA Programming and Performance	3	3645	July 19, 2014
atomic locks CUDA Programming and Performance	15	13106	January 27, 2012
My lock and unlock function. Help~~~ CUDA Programming and Performance	5	3546	April 30, 2010
A problem of implementing mutex in CUDA CUDA Programming and Performance	6	1714	June 29, 2017
How to implement lock on the gpu? CUDA Programming and Performance	5	5347	January 19, 2010
atomiccas usage Legacy PGI Compilers	2	3733	December 25, 2014
Mutex problem problem with global mutex CUDA Programming and Performance	19	11346	November 17, 2010
atomicCAS for mutiple blocks & mutiple threads - CUDA 3.2 - Fedora 10 CUDA Programming and Performance	7	2602	April 25, 2011
atomicCAS CUDA Programming and Performance	8	4137	July 4, 2011

Try to use lock and unlock in CUDA

Related topics