Hello! I am doing something involving implementing a mutex in CUDA. I followed this method synchronization - Implementing a critical section in CUDA - Stack Overflow
And the only difference is that I must put the mutex in a for loop.
The original author said following code can work:
__global__ void k_testLocking(unsigned int* locks, int n) {
int id = threadIdx.x % n;
bool leaveLoop = false;
while (!leaveLoop) {
if (atomicExch(&(locks[id]), 1u) == 0u) {
//critical section
leaveLoop = true;
atomicExch(&(locks[id]),0u);
}
}
}
I put the code block into a for loop so my code is like this:
__global__ void k_testLocking(unsigned int* locks, int n, unsigned int* array) {
int id = threadIdx.x % n;
for( int i=array[id]; i<array[id+1]; i++ ){
bool leaveLoop = false;
while (!leaveLoop) {
if (atomicExch(&(locks[i]), 1u) == 0u) {
//critical section
leaveLoop = true;
atomicExch(&(locks[i]),0u);
}
}
}
}
But my code does not work. It seems to meet some dead lock. Can anyone tell me what’s wrong with my code please?