persistent global memory?

So I’m calling two different kernels with a shared global device array, that are all placed in a .so library:

[codebox]void *device;

global void kernel1() {

// write data to device array


global void kernel2() {

// read data from device array


I thought that global memory was persistent, but it seems that even though the device pointer points to the same device memory, that memory has already been overwritten by the time kernel2 is launched. Is this caused by something I am doing wrong, or am I wrong about global device memory being persistent until cudaFreed?

Thanks for any help!

Actually, I’ve found the problem, caused by accessing the device array from two different threads:

Several host threads can execute device code on the same device, but by design, a host thread can execute device code on only one device. As a consequence, multiple host threads are required to execute device code on multiple devices. Also, any CUDA resources created through the runtime in one host thread cannot be used by the runtime from another host thread.