Simple Question about kernels and global memory

pipajungle · June 12, 2009, 12:12am

Is it possible for the 2nd kernel to use the global memory allocated by the first kernel?

For example:

Host Program

allocate memory and copy it to device

call kernel

host

call kernel [and use the memory allocated by the first kernel]

host

call kernel [and use the memory allocated by the first kernel]

......

Thanks!

cloudguitar · June 12, 2009, 12:31am

Hey,

device memory is allocated from the host with cudaMalloc, and it can be accessed by whatever kernel till you free it with cudaFree (again from the host)

(I am not sure what you mean with “allocated by a kernel”, though).

Is it possible for the 2nd kernel to use the global memory allocated by the first kernel?

For example:
Host Program

allocate memory and copy it to device

call kernel

host

call kernel [and use the memory allocated by the first kernel]

host

call kernel [and use the memory allocated by the first kernel]

......
Thanks!

pipajungle · June 12, 2009, 12:58am

Thanks for ur reply.
I am new to this terminology. What I meant by " allocated by the first kernel" is that memory allocated before the first kernel is invoked.

If I have few big data structures and I allocate and copy memory (my data structures) to the device and invoke my first kernel and operate on the data structures to modify them. After that “without” deallocating the device memory I invoke my second kernel and again operate on the data structures already present on the device memory and keep on doing that with a series of kernel until I achieve my desired results. Is that feasible in CUDA or I have to copy my data structures to host and again copy them back to the device after every kernel invocation?

I am repeating the same question (i think) but I just want to be sure (please bear with a noob).

Thanks!

cloudguitar · June 12, 2009, 1:43am

You do NOT need to copy your data back and forth between device and host. The memory allocated by cudaMalloc persists till you free it manually. So, you can copy data to the device memory, work on it as much as you want (with as many kernels as you need), and then copy it back to the host only when you have to use it on the host (for printing, saving, or whatever).

I hope I clarified your doubts :)

Thanks for ur reply.

I am new to this terminology. What I meant by " allocated by the first kernel" is that memory allocated before the first kernel is invoked.

If I have few big data structures and I allocate and copy memory (my data structures) to the device and invoke my first kernel and operate on the data structures to modify them. After that “without” deallocating the device memory I invoke my second kernel and again operate on the data structures already present on the device memory and keep on doing that with a series of kernel until I achieve my desired results. Is that feasible in CUDA or I have to copy my data structures to host and again copy them back to the device after every kernel invocation?

I am repeating the same question (i think) but I just want to be sure (please bear with a noob).

Thanks!

pipajungle · June 12, 2009, 3:14am

Thanks a lot!

You have cleared all my doubts!

Topic		Replies	Views
Understanding CUDA- simple quesions CUDA Programming and Performance	7	6065	June 12, 2009
device to device memory use CUDA Programming and Performance	1	3525	April 27, 2010
Memory usage within GPU CUDA Programming and Performance	2	2351	July 13, 2009
Keep kernel alive? CUDA Programming and Performance	5	7397	July 29, 2010
Memory allocation from Device onto Device (Global) Memory CUDA Programming and Performance	2	1224	February 22, 2009
Pointer to CudaMem Using allocated Memory in different Kernels CUDA Programming and Performance	9	5524	November 23, 2009
How get in host the memory allocated from device CUDA Programming and Performance	10	2991	August 16, 2017
''cudaMemcpy'' failed to copy from device memory dynamically allocate using ''malloc'' CUDA Programming and Performance	5	466	October 25, 2022
"conserve" SMEM data from one kernel to another CUDA Programming and Performance	8	6297	February 11, 2010
__shared__ memory: Just a question what happens if CUDA Programming and Performance	3	844	March 15, 2016

Simple Question about kernels and global memory

Related topics