I am using GTX 280 card which has compute capability of 1.3.
I have also installed cuda-3.0beta1 version of CUDA computing environment.
From cuda reference manual, I came to know that GTX card support atomic operations,
So when I used atomic operations for inter block sync(), atomic was actually not working.
I am compiling with -arch sm_11 flag. And compilation is successful.
But the kernel is not launching due to unspecified failure.
Also below conditional compilation printf not using atomics.
#ifndef CUDA_NO_SM_11_ATOMIC_INTRINSICS
printf(“WARNING! Not using atomics!\n”);
#endif
This is my inter block sync() code.
device_ int g_mutex; // not declaring as volatile as atomicAdd doesn’t accept volatile variable
device void __gpu_sync()
{
int tx = threadIdx.x;
int ty = threadIdx.y;
int linBlockInd = blockIdx.y ;
int tid_in_block = ty*blockDim.x+tx;
int cas_variable=1, val=0, to_execute=1, ret=0;
int dummy=1;
__syncthreads();
if (tid_in_block == 0) {
atomicAdd(&g_mutex, 1);
__threadfence();
while(g_mutex != NO_OF_GRID) {
ret=atomicCAS(&cas_variable, to_execute,val); // just not to compile away while loop.
if (ret == 0) {
val+= 1;
} else {
val+= 2;
}
}
}
__syncthreads();
}
Can someone please help me out with resolving this issue.
Thanks in Advance