Hi,
My code works perfectly on a CUDA emulator (make with emu=1 option), however the kernel hangs when run on actual GPU on a linux machine.
My kernel code looks somewhat like this:
__global__ KERNEL()
{
__shared__ int arr1[..]
__shared__ int arr1[..]
...writes to shared arrs...
__syncthreads();
while(1) {
...writes to shared arrs...
__syncthreads();
...writes to shared arrs...
__syncthreads();
...reads from shared arrs...
__syncthreads();
}
}
Can anyone please tell me if this is a known issue??
And suggestions to overcome this problem.
Thanks