Allocating with "new" keyword in cuda kernel

Hi all,

I have heard contradicting statements on the use of the new keyword in a Cuda kernel.

What is the final answer? I am using Cuda 2.1. Would this be allowed or not:

__global__ void SomeKernel (int N) {

int *someArray = new int [N];

...

}

I certainly don’t get any compiler errors.

But I have heard too often that dynamic allocation with new keyword is not allowed, so I’m a bit confused.

Thanks for your advice!

It’s not allowed. You get an external call error upon compiling that with CUDA 2.1.

Just to clarify: the error you get (on most platforms) is not during comipliation, but linking. The usual message is an undefined reference to something like __a_somethadsfhasdfa_gooblygook_aerasbZlongname which at first glance has nothing to do with the new keyword.

Edit: I misunderstood the question… I thought the new keyword was being used in a host function in a file compiled by nvcc. Just ignore me.

Thanks for the prompt reply!

So how would I create an array of size N, when N is being passed into the kernel.

int someArray[N] obvioulsy gives the error that the size has to be constant.

Unfortunately I need to do a lot of array resizing/reallocations with every kernel invocation. Is there basically no graceful way of doing array resizing in Cuda?

For example, I am copying an array ArPoints to the device, then after some computation the kernel needs to add numerous new points to the ArPoints. The only way I can do this now is to append the new points to the host version of ArPoints, then reallocate and recopy ArPoints to the device. When the size of ArPoints is close to a million, I can’t afford to copy the entire array after every single kernel invocation, just to add a few elements to the end.

Is this the only way to do it, or are there more graceful methods?

Thank you so much!

No, there is not really a good way to allocate a dynamic-length array for each thread in CUDA. If your kernel is operating on the whole array though…you could just reallocate the memory for the array (or allocate another chunk with the new size and do a device-device copy to copy the old array to the new array), then just copy the new array members to the end of the old array (basically, append the new points on the device rather than the host).