Dynamic memory allocation


Is it possible to allocate memory space dynamicaly inside kernels (global functions).


To add more precision, Initialiy I have the folowing sample:

__shared__ float   Xs[NBR_LINE][WWA*BL];// Store X    Values (WWA*BL  )	

__shared__ float   Cs[NBR_LINE][WWA*BL];// Store Cs  Values (WWA*BL	)

__shared__ int    Jms[NBR_LINE][WWA];   // Store JM  Values (WWA    	)


with ThreadBloc of dimension WWA .

But now I want to use this kernel for a variable thread bloc dimensions, I suppose that set dimension will not be a problem, but I dont know how to allocate a dynamic size memory Xs,Cs and Jms in this sample.


No. You can only allocate memory before the kernel runs.

Registers will be allocated by the compiler.

For shared or constant mem you put static array sizes into the text and the compiler will create code to allocate it upon launch. Shared mem can also be allocated explicitly as extern shared in the launch call.

For global linear/array mem you need to use cudaMalloc on the host.


Thank you Peter for your Answer,

In my code, shared mem size is not known at the begining of the program execution, it depends on the entry data. So I cant define static size directly. What can be as solution is to generate dynamic CUDA code and compile it, but I dont know if I will need to link it with my progam or not.

Best Regards

If the input data is also available on the CPU, you can look at it and provide the shared mem size dynamically when calling the kernel (extern shared method).

If you can bound the amount of shared mem needed, you could statically allocate the max.

If there is only a small set of possible mem sizes needed, turn the kernel into a template and fully specialize all needed instances in the routine that calls the kernel. nvcc will then automatically create all kernel variants for you. They will get all embedded so you don’t need to worry about linking separate files.