How to allocate memory dynamically in cuda kernel

Hi everyone:
I run the separable convolution code in the sdk example. And I found the kernelradius is specified regardless of the smoothlevel of the Gaussian kernel(the sigma). If I want that the kernelradius is depending on the sigma of the Gaussian . For example, kernelradius = 4 * sigma. Then the kernel_w and kernel radius become variable not constant value. As we all know, in the C language, the size of array must be a constant value. For example, int a[5]; not kernelradius = 4 * sigma;
int a[kerelradius];.
In C language , I can use “malloc“ function to allocate memory dynamically. But in cuda kernel , which function I can use to allocate memory dynamically.

The following is some sentence in the separable convolution code. If the kernel_w and kernel _radius is variables, the following sentence is wrong. How can I change it, which function I can use to allocate memory dynamically in cuda kernel.
device constant float d_Kernel[KERNEL_W];
shared float data[KERNEL_RADIUS + ROW_TILE_W + KERNEL_RADIUS];

Thank you and best regards.

In one simple answer: you can’t :), no dynamic allocation is possible inside a kernel. What you can do is dynamically allocate the memory before the kernel launch (in this point you should have all the info you need in order to do that). for const memory it should just work, for shared you need to change the kernel declaration to extern shared. look for a example in the examples … :D