Preallocating device memory for reuse.

I was wondering how does one go about preallocating memory on the device to be able to reuse those buffers, so that the slow memory allocation process is done only once?

For example, something like

[codebox]// global variable

device char* devBuf;

int main() {

int memsize = 256*256;

cutilSafeCall( cudaMalloc((void**) &devBuf, memsize) );

// wait for event

// run the cuFFTkernel

// wait for event

// run the cuFFTkernel…

return 0;


I’ve tried this, but CUFFT doesn’t like it too much :blink: Anyone know how to preallocate memory for the GPU?


I guess you want to write global-memory manager, which would mimic the behaviour of cudaMalloc?
It should work (I did similar thing in other project), however remember that if your manager is asked to allocate some memory, it should be aligned to… I don’t remember how many, but I guess to 16 bytes. If it doesn’t work, try 512.

How did you go about doing that? The CUFFT engine is failing when I try to allocate memory using the global variable attempt :/ Is there another keyword besides device that I should be using to declare these memory blocks? Or, should they not be global? I’m just confused as to why it compiles and runs fine until the CUFFT execution fails. Any help would be appreciated.


I think the confusion here is that device denotes a variable that physically lives on the device, not a pointer that points to memory on the device. You should declare char *devBuf inside your main function instead. Then you can pass that pointer off to other CUDA functions. (You just can’t dereference it on the host, as you might expect.)