All,
I am posting this again because the previous thread has gone out of visibility.
Here is a simple code that just hangs on my TESLA C1060.
This has NOTHING TO DO with WARP-SCHEDULING/DIVERGENCE OR dependence on any other un-documented feature.
The code works if I use a lock in global memory. Fails if I use a lock in shard memory (even for 1 Block, 32 threads).
You may comment the “#define SHARED_LOCK” line to use lock in global memory (no other changes in code required).
This might make it easy for experimenting.
Also, the code below has a no-brainer code inside – which when un-commented makes the code work shared lock as well. I cant make any sense of out this. Examining PTX does not give any clue as well.
Here is the complete code. You can just copy paste and compile as “nvcc -arch=sm_13 xxxx.cu”
#include <stdio.h>
__device__ int result = -123;
__device__ int g_lock=-1;
#define SHARED_LOCK // Comment this line to use g_lock AND The program will work fine.
#ifdef SHARED_LOCK
#define LOCK sharedLock
#else
#define LOCK g_lock
#endif
__global__ void checkKernel(void)
{
__shared__ volatile int totalParticipation;
#ifdef SHARED_LOCK
__shared__ int sharedLock;
#endif
int lockResult;
if (threadIdx.x == 0)
{
atomicExch(&LOCK,-1);
totalParticipation = 0;
__threadfence();
}
__syncthreads();
for(int i=0; totalParticipation<1; i++)
{
__syncthreads();
lockResult = atomicCAS(&LOCK, -1, (int)threadIdx.x);
if (lockResult == -1)
{
totalParticipation++;
__threadfence();
atomicExch(&LOCK, -1);
}
__syncthreads();
__threadfence();
/*
// Un-cmment the following brain-dead code and you will find that the code works
// even in case of shared memory lock
if (totalParticipation == 0)
break;
*/
}
if (threadIdx.x == 0)
{
result = totalParticipation;
}
__syncthreads();
return;
}
int main()
{
cudaError_t err;
int host=-123;
checkKernel<<< 1, 32>>>();
err = cudaThreadSynchronize();
if (err != 0)
{
printf("Error launching kernel, err = %d\n",err);
return -1;
}
cudaMemcpyFromSymbol(&host, "result", sizeof(int), 0, cudaMemcpyDeviceToHost);
printf("host = %d\n", host);
}
Thanks,
Best Regards,
Sarnath