Memory usage within GPU

Hi ,

This might be a very trivial concept but i just want to know if we can use a result obtained from 1 particular kernel in another kernel w/o retransfering it to the cpu and back to GPU. Something like assigning global variables but within the GPU such that all kernels can use it.


Sure. Memory allocated with cudaMalloc is in global memory and persists across kernel invocations. It’s not going to suddenly disappear from under you or anything like that–it’s perfectly legal to do

cudaMalloc((void**)&a_device, sizeof(type)*a_length);

cudaMemcpy(a_device, a_host, sizeof(type)*a_length, cudaMemcpyHostToDevice);

kernelA<<<gridDim, blockDim>>>(a_device);

kernelB<<<gridDim, blockDim>>>(a_device);

kernelC<<<gridDim, blockDim>>>(a_device);

cudaMemcpy(a_host, a_device, sizeof(type)*a_length, cudaMemcpyDeviceToHost);

(you don’t need any synchronization between the kernels since they’re all in the same stream)

Great. Thanks !