Works on Emulator but not on GPU device

Hi,

My code works perfectly on a CUDA emulator (make with emu=1 option), however the kernel hangs when run on actual GPU on a linux machine.

My kernel code looks somewhat like this:

__global__ KERNEL()

{

__shared__ int arr1[..]

  __shared__ int arr1[..]

...writes to shared arrs...

  __syncthreads();

while(1) {

	...writes to shared arrs...

	__syncthreads();

	...writes to shared arrs...

	__syncthreads();

	...reads from shared arrs...

	__syncthreads();

  }

}

Can anyone please tell me if this is a known issue??

And suggestions to overcome this problem.

Thanks

Woah, a while(1) hangs? No way!