Keep previously allocated memory on GPU

Hi all,

I’m currently working on an application that calls N times the same CUDA function. Basically in this function I need to allocate several memory space (using cudaMalloc) to process my data. So I was wondering if there is a way to allocate and free this global memory only one time instead of doing it N times (I expect a performance increase doing that way).
I have tried to store in a pointer the address of all global memories but without success.
Does someone know how to do it ?

Thank you. :rolleyes:


Something similar to this won’t work?

cudaMalloc((void**)&d_big_ptr, N*size_of_arrays);


for(int i = 0; i < N; i++)

	my_kernel<<<blocks, threads>>>(d_big_ptr+i*array_length);


cudaFree( d_big_ptr );

Or am I missing something here?

Hi Jimmy,

Thank you for your reply. I’ll try to better explain my problem. In fact I m working on a dll that call a CUDA function (not the kernel), and I have to call this function N times ( I’m working with video so for each image I need to call it)

In this function I allocate memories using cudaMalloc, call my kernels and free memories.

I want to do the memory allocation and free on the GPU only one time. Basically the allocation will be done with the first image and the memory free will be done with the last. So for the other images I will just need to write on the previously allocated memory on the GPU.

My idea was to store the address of pointers in another pointer , something like this :

[codebox]void RenderingCudaHost(void** &CUDAPtr, BYTE* Source_host, int sizeData)


BYTE *Source ;



	CUDAPtr = (void**) malloc(10*sizeof(void*)) ;

	cudaMalloc((void**) &Source, sizeData);


	CUDAPtr[0] = (void*) Source;		





	Source = (BYTE*) CUDAPtr[0];


cudaMemcpy(Source, Source_host, sizeData, cudaMemcpyHostToDevice) ;


Thanks a lot


I think i understand what you are wanting to do now… You want to allocate the needed memory once when you first call “RenderingCudaHost” and then reuse that the next time you call RenderingCudaHost.

Unfortunately you can’t allcoate memory inside of “RenderingCudaHost” and expect it to be there later ( at least this goes for when writing normal C code).

Ex (this DOES NOT work):

void allocateAndFill(float* ptr, int N)


	int i;

	ptr = (float*)malloc(N*sizeof(float));


	for(i = 0; i < N; i++)

		ptr[i] = 1.0f;


void main(int argc, int agv[])


	float* ptr;

	int N = 5;

	int i;



	for(i = 0; i < N; i++)

		printf("\n %0.3f", ptr[i]); // print garbage



=> Garbage values

Im afraid you have allocate and free the memory at a higher up in your call stack.

So rather:

cudaMalloc((void**) &Source, sizeData);

RenderingCudaHost(Source, Source_host, int sizeData); // with some changes…



Yes that’s exactly my problem.

RenderingCudaHost is the function that is inside my dll and I call it for each image. If I allocated memory higher up in my call stack, I’ll have to create a new function in my dll that I call earlier than RenderingCudaHost. But I guess it won’t solve my problem. I’m afraid I’m a little bit stuck now. :confused: Any ideas ?

Thank you Jimmy

I think any solution to this problem requires that either your API requires the user to allocate some opaque storage and pass it to your DLL in subsequent calls (Lots of APIs do this.) or you do something terrible and make your code non-reentrant by storing such local data in a global variable. Clearly the first option is preferable. :)