global variables

Hello guys,

just a bit confused about global variables usage and declaration :


int g_array;

__global kernel()


int idx = … ;




The array “g_array” is just and only used from the kernel. Do i still need to use the cudaGetSymbolAddress() and then cudaMalloc() to allocate some memory? Or

is it allowed to allocate some mem directly form the kernel ?



you dont if you mark the array as global or shared

hmm, I’m marked them with device and it compiles but i receiving some strange results during runtime.

In emu mode things are working well, but not on hardware.

shared can not be used because the array is too big. global for variables ?

It is not specified in the documentation.

That was the reason why a asked if i need always use cudaGetSymbolAddress/cudaMalloc on hostside to allocate memory.



You don’t need to always allocate memory with cudaMalloc (though it is certainly the most straightforward way to manage it). But an array declared device cannot be accessed on the host without cudaGetSymbolAddress and a cudaMemcpy. Are you just accessing g_array in the host code? This is likely the cause of your crash because it would dereference an invalid pointer. In emulation, “device” arrays are actually on the host, so it works without any warnings.

Thank you for the reply.

To access the device g_array from the host will not work because the g_array is

not in the same memory space. This is clear.

The device g_array is only used by device functions to store some

temporary data in global memory. shared can not be used because of the data

size (around 250 Kb/Thread).

Because my code is working well in emu mode and partially on the hardware I was not sure if I understand the scope of global device variables well.

Maybe one another note. When I’m accessing the g_array using a fix number instead of a variable (like g_array[0] = 0;) things are working well. The index variable is declared device as well.

thank you,


Ok, it seems like you are doing everything correctly then. Perhaps the best thing you can do at this point is to create a minimal test case file that reproduces your problem (preferably one that can be directly compiled with nvcc -o exec and post it here. There has to be some little detail you missed somewhere.