I’ve got a weird behavior in my CUDA kernels:
My first kernel sets a global device variable like this:
[codebox]device uint* g_GlobalVar;
global void MyKernel(uint* p_Ptr)
{
uint tIndex = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
if(threadIdx.x == 0)
{
// init some __shared__ vars ...
g_GlobalVar = p_Ptr;
}
__syncthreads();
// do more work
}
global void MyKernel2()
{
...
uint val = g_GlobalVar[index];
...
}
[/codebox]
This works fine with emulation and running on the device. However if I set g_GlobalVar like this:
[codebox]
if(tIndex == 0)
g_GlobalVar = p_Ptr;
[/codebox]
the 2nd Kernel execution fails (unknown error) when I run it on the device. I tried placing the if() version at various places (inside the if(threadIdx.x == 0) as well as before and after the __syncthread) - but the result is always the same. Also when I remove the line using g_GlobalVar in MyKernel2 it executes without any error. It guess I have a sync problem somewhere - any ideas cause I’m absolutely clueless?? :(