Strange behavior with global __device__ variable

I’ve got a weird behavior in my CUDA kernels:

My first kernel sets a global device variable like this:

[codebox]device uint* g_GlobalVar;

global void MyKernel(uint* p_Ptr)

{

uint tIndex = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;

if(threadIdx.x == 0)

{

  // init some __shared__ vars ...

  g_GlobalVar = p_Ptr;

}

__syncthreads();

// do more work

}

global void MyKernel2()

{

...

uint val = g_GlobalVar[index];

...

}

[/codebox]

This works fine with emulation and running on the device. However if I set g_GlobalVar like this:

[codebox]

if(tIndex == 0)

g_GlobalVar = p_Ptr;

[/codebox]

the 2nd Kernel execution fails (unknown error) when I run it on the device. I tried placing the if() version at various places (inside the if(threadIdx.x == 0) as well as before and after the __syncthread) - but the result is always the same. Also when I remove the line using g_GlobalVar in MyKernel2 it executes without any error. It guess I have a sync problem somewhere - any ideas cause I’m absolutely clueless?? :(

Could be related to a compiler bug.

See

http://forums.nvidia.com/index.php?showtop…t=0#entry556544
http://forums.nvidia.com/index.php?showtopic=98640

N.

Thanks for information Nico!