just a bit confused about global variables usage and declaration :
[codebox]
int g_array;
__global kernel()
{
int idx = … ;
g_array[idx];
}
[/codebox]
The array “g_array” is just and only used from the kernel. Do i still need to use the cudaGetSymbolAddress() and then cudaMalloc() to allocate some memory? Or
is it allowed to allocate some mem directly form the kernel ?
You don’t need to always allocate memory with cudaMalloc (though it is certainly the most straightforward way to manage it). But an array declared device cannot be accessed on the host without cudaGetSymbolAddress and a cudaMemcpy. Are you just accessing g_array in the host code? This is likely the cause of your crash because it would dereference an invalid pointer. In emulation, “device” arrays are actually on the host, so it works without any warnings.
To access the device g_array from the host will not work because the g_array is
not in the same memory space. This is clear.
The device g_array is only used by device functions to store some
temporary data in global memory. shared can not be used because of the data
size (around 250 Kb/Thread).
Because my code is working well in emu mode and partially on the hardware I was not sure if I understand the scope of global device variables well.
Maybe one another note. When I’m accessing the g_array using a fix number instead of a variable (like g_array[0] = 0;) things are working well. The index variable is declared device as well.
Ok, it seems like you are doing everything correctly then. Perhaps the best thing you can do at this point is to create a minimal test case file that reproduces your problem (preferably one that can be directly compiled with nvcc -o exec file.cu) and post it here. There has to be some little detail you missed somewhere.