Code runs on emulator but not on GPU

Hi,

My code works perfectly on a CUDA emulator (make with emu=1 option), however the kernel hangs when run on actual GPU on a linux machine.

My kernel code looks somewhat like this:

__global__ KERNEL()

{

__shared__ int arr1[..]

  __shared__ int arr1[..]

...writes to shared arrs...

  __syncthreads();

while(1) {

	...writes to shared arrs...

	__syncthreads();

	...writes to shared arrs...

	__syncthreads();

	...reads from shared arrs...

	__syncthreads();

  }

}

Can anyone please tell me if this is a known issue??

And suggestions to overcome this problem.

Thanks

Probably, some more details in the code will help…

In this case I don’t think it is necessary. A kernel containing

while(1) { }

runs fine when execution is completely serialized (ie. emulation mode), and hangs when executed in a non-deterministic parallel fashion. Quelle surpise?

He probably has a break statement somewhere in there (termination condition).

Well guess what, some threads leave the while loop early, and others will hang on the next __synchthreads() as a result.

Generally speaking, never put __syncthread() into a conditional branch (your while loop terminates conditionally, I suppose).

And that’s a no-no.

Christian

Another could be that your while loop might be using variables to “synchronize between different blocks” and you might be using those variables to break out of the while loop…
That’s why, instead of hypothesizing about what kind of code you might have written and then finding a solution to it, it would be easier if there’s an actual code to look at… :)