Simple Question about kernels and global memory

Is it possible for the 2nd kernel to use the global memory allocated by the first kernel?

For example:

Host Program

allocate memory and copy it to device

call kernel


call kernel [and use the memory allocated by the first kernel]


call kernel [and use the memory allocated by the first kernel]




device memory is allocated from the host with cudaMalloc, and it can be accessed by whatever kernel till you free it with cudaFree (again from the host)

(I am not sure what you mean with “allocated by a kernel”, though).

Thanks for ur reply.
I am new to this terminology. What I meant by " allocated by the first kernel" is that memory allocated before the first kernel is invoked.

If I have few big data structures and I allocate and copy memory (my data structures) to the device and invoke my first kernel and operate on the data structures to modify them. After that “without” deallocating the device memory I invoke my second kernel and again operate on the data structures already present on the device memory and keep on doing that with a series of kernel until I achieve my desired results. Is that feasible in CUDA or I have to copy my data structures to host and again copy them back to the device after every kernel invocation?

I am repeating the same question (i think) but I just want to be sure (please bear with a noob).


You do NOT need to copy your data back and forth between device and host. The memory allocated by cudaMalloc persists till you free it manually. So, you can copy data to the device memory, work on it as much as you want (with as many kernels as you need), and then copy it back to the host only when you have to use it on the host (for printing, saving, or whatever).

I hope I clarified your doubts :)

Thanks a lot!

You have cleared all my doubts!