Is it possible for the 2nd kernel to use the global memory allocated by the first kernel?
For example:
Host Program
allocate memory and copy it to device
call kernel
host
call kernel [and use the memory allocated by the first kernel]
host
call kernel [and use the memory allocated by the first kernel]
......
device memory is allocated from the host with cudaMalloc, and it can be accessed by whatever kernel till you free it with cudaFree (again from the host)
(I am not sure what you mean with “allocated by a kernel”, though).
Thanks for ur reply.
I am new to this terminology. What I meant by " allocated by the first kernel" is that memory allocated before the first kernel is invoked.
If I have few big data structures and I allocate and copy memory (my data structures) to the device and invoke my first kernel and operate on the data structures to modify them. After that “without” deallocating the device memory I invoke my second kernel and again operate on the data structures already present on the device memory and keep on doing that with a series of kernel until I achieve my desired results. Is that feasible in CUDA or I have to copy my data structures to host and again copy them back to the device after every kernel invocation?
I am repeating the same question (i think) but I just want to be sure (please bear with a noob).
You do NOT need to copy your data back and forth between device and host. The memory allocated by cudaMalloc persists till you free it manually. So, you can copy data to the device memory, work on it as much as you want (with as many kernels as you need), and then copy it back to the host only when you have to use it on the host (for printing, saving, or whatever).