Reallocating memory on CUDA?

As far as I know it is not possible to reallocate the size of an array stored on the device using CUDA. What is the preferred way of performing dynamic memory allocation?

If I am running quite close to the capacity of my device, I’d ideally like to avoid copying an array’s data from device to host, deleting the device memory, then allocating the device memory with the new larger size and copying the data back again. However, I can’t really see any way round this if I am unable to fit the “new” allocated array alongside the “old” allocated array on the device in order to perform the copying entirely to the larger array on the device.

Any suggestions for ways to cope with this?

Why don’t you just allocate all the available free memory with a single malloc() call, then manage the memory yourself? If things are not too complex, the equivalent of realloc() could potentially be made free.

I suppose that is a reasonable suggestion. It would certainly work if I have just one or more dynamic arrays of the same length. As soon as I have multiple arrays all of differing lengths, things get a bit more difficult to manage efficiently though. I guess I’d end up just keeping track of the start index and length of each array as they change and passing them into the kernels.

What will the “alloca” method in the ptx specification provide us with once support is added? The ability to allocate and deallocate memory inside kernels?