cufftCreateAsync / cufftDestroyAsync

Hi,

I need to create cuFFT plans dynamically in the main loop of my application, and I noticed that they cause a device synchronization. I suppose this is because of underlying calls to cudaMalloc.

This behaviour is undesirable for me, and since stream ordered memory allocators (cudaMallocAsync / cudaFreeAsync) have been introduced in CUDA, I was wondering if you could provide a streamed cuFFT plan allocator.

Thanks,
Julien

You may wish to investigate caller-allocated work areas to see if it can be adapted to your use-case. Depending on how you use it, it may provide some benefit.

Hi Robert,

Thank you for your answer. As a matter of fact, I already allocate the work area myself using the stream ordered memory allocators. My problem here is with the initial plan creation cufftCreate, before any size is given.

The documentation states that cufftMakePlan1d can be called only once per plan. Because I need various and unpredictable FFT lengths and batch sizes, this forces me to delete and create a new plan at each iteration.

Best regards,
Julien

I know that the cufft team is aware of desires to improve create/destroy cycle performance. However you may still wish to file a bug if you have specific requests or a specific example to consider.

In addition to Bob’s response…

Because I need various and unpredictable FFT lengths and batch sizes, this forces me to delete and create a new plan at each iteration.

A workaround would be to create a array/vector containing all plans. You can destroy them all at the end of your program.

Thank you for your answers.

I think it would be a useful feature to have more control here, so I’ll consider filing a bug.

I’ll use your workaround in the mean time.

Best regards,
Julien