cufftGetSize1d fails with a CUFFT_ALLOC_FAILED error

sgw136 · April 11, 2023, 4:19pm

I have a caching scheme to manage the workspace for FFTs myself because there are a large number of different FFTs being applied and this substantially cuts down on memory usage. As part of this, I use cufftGetSize1d (...) to determine the workspace size. However, if I am low on GPU memory, it returns a CUFFT_ALLOC_FAILED error.

From this Stack Overflow question about the same topic, What is the meaning of CUFFT_ALLOC_FAILED return value when calling cufftGetSize*()?, it was deduced that the error actually means the allocation would fail since cufftGetSize1d doesn’t actually allocate memory.

In the debugger, I can estimate the amount to be about 1 GB based on the workspace sizes of similar FFTs currently created. There is less than 1 GB available, which I suspect is the reason for the error. My problem is, I need to know that number, even if there isn’t memory space available, because my application can free GPU memory to make room, it just needs to know how much to free. And in this specific case, the workspace allocated is actually already big enough, so I don’t even need to allocate more, but the program can’t know that since cufftGetSize1d errors rather than telling me the answer. Is there anyway I can get this workspace size, even when the GPU is low on memory?

Even if the method return 16 PB as an answer (which is absolutely ridiculous), that would be good even though I obviously don’t have that much memory, because at least there is a value to report in the error message stating the user needs a GPU with xxx GB memory in order to process the supplied data/configuration.

Robert_Crovella · April 11, 2023, 4:41pm

can you provide a short, complete example of what fails?

When I think of user-managed workspace allocations, in the 1D case, the api sequence I would expect is:

cufftHandle p;
cufftCreate(&p);
cufftSetAutoAllocation(p, 0);
size_t ws;
unsigned char *wsp;
cufftMakePlan1D(p, ..., &ws);
cudaMalloc(&wsp, ws);

Is that what you are doing?

sgw136 · April 11, 2023, 6:40pm

Correct. For completeness, here is a bulk of the code. cudaCheck and cufftCheck are just macros to throw exceptions when respective success is not returned.

cufftHandle handle;
size_t size = 0;
void* workarea = nullptr;
cufftCheck(cufftCreate(&handle));
cufftCheck(cufftSetAutoAllocation(handle, 0));
cufftCheck(cufftPlan1d(&handle, cols, type, rows));
cufftCheck(cufftGetSize1d(handle, cols, type, rows, &size));
cufftCheck(cufftSetStream(handle, stream));
cudaCheck(cudaMalloc(&workarea, size));
cufftCheck(cufftSetWorkArea(handle, workarea);

The full code is a bit more complicated as it caches the plans, and resets the workarea for all cached plans in the event the workarea must grow for a newly created plan. I think this issue only occurs when nearly all available memory on the GPU is consumed. In my current case, I am using ~11.6 GB of GPU memory when the error occurs, and the CUDA properties report ~31 MB free memory.

Robert_Crovella · April 11, 2023, 8:59pm

Your sequence doesn’t match mine.

cufftCreate initializes a handle.
cufftSetAutoAllocation sets a parameter of that handle
cufftPlan1d initializes a handle.

Do you see the issue?

My sequence:

cufftCreate(&p);    //initializes handle
cufftSetAutoAllocation(p, 0);  //updates existing handle
size_t ws;
unsigned char *wsp;
cufftMakePlan1D(p, ..., &ws); //updates existing handle

Here’s an example demonstrating the difference:

$ cat t2248.cu
#include <iostream>
#include <cufft.h>
#include <unistd.h>
#include <cassert>

int main(){
  const int nx = 1048576*32+1;
  const int ny = 16;
  size_t ws = 0;
  cufftHandle p;
  cufftResult r;
  r = cufftCreate(&p);
  assert(r == CUFFT_SUCCESS);
  r = cufftSetAutoAllocation(p, 0);
  assert(r == CUFFT_SUCCESS);
#ifdef USE_MY_METHOD
  r = cufftMakePlan1d(p, nx, CUFFT_C2C, ny, &ws);
#else
  r = cufftPlan1d(&p, nx, CUFFT_C2C, ny);
#endif
  assert(r == CUFFT_SUCCESS);
  std::cout << "ws = " << ws << std::endl;
  size_t mfree, mtot;
  cudaMemGetInfo(&mfree, &mtot);
  std::cout << "free memory: " << mfree << std::endl;
  //sleep(32);
}
$ nvcc -o t2248 t2248.cu -lcufft
$ ./t2248
ws = 0
free memory: 15478554624
$ nvcc -o t2248 t2248.cu -lcufft -DUSE_MY_METHOD
$ ./t2248
ws = 17199267840
free memory: 32679395328
$

When we use your sequence, the call to cufftSetAutoAllocation(..., 0); doesn’t have the desired effect: the plan creation still allocates ~17GB for this particular transform (the GPU above is a V100 32 GB). When we use my sequence, the call to cufftSetAutoAllocation(..., 0); has the desired effect: the call to the plan creation doesn’t allocate space for the work area.

It is true that cufftSetWorkArea() should override the initial allocation, but in a memory-constrained setting you may still be making your life difficult, because that overriding in your example does not happen until you have already allocated more space:

cudaCheck(cudaMalloc(&workarea, size));
cufftCheck(cufftSetWorkArea(handle, workarea);

Robert_Crovella · April 12, 2023, 12:59am

FWIW, I didn’t have any luck creating “cufftGetSize1d fails with a CUFFT_ALLOC_FAILED error” using my method, and cutting the free memory down to 128MB:

$ cat t2248.cu
#include <iostream>
#include <cufft.h>
#include <unistd.h>
#include <cassert>

int main(){
  const int nx = 1048576*32+1;
  const int ny = 32;
  size_t ws = 0;
  size_t *wsp;
  cufftHandle p;
  cufftResult r;
  r = cufftCreate(&p);
  assert(r == CUFFT_SUCCESS);
  r = cufftSetAutoAllocation(p, 0);
  assert(r == CUFFT_SUCCESS);
#ifdef USE_MY_METHOD
  r = cufftMakePlan1d(p, nx, CUFFT_C2C, ny, &ws);
#else
  r = cufftPlan1d(&p, nx, CUFFT_C2C, ny);
#endif
  assert(r == CUFFT_SUCCESS);
  std::cout << "ws = " << ws << std::endl;
  size_t mfree, mtot;
  cudaMemGetInfo(&mfree, &mtot);
  std::cout << "free memory: " << mfree << std::endl;
  cudaError_t cr = cudaMalloc(&wsp, mfree - 1048576*128);
  assert(cr == cudaSuccess);
  cudaMemGetInfo(&mfree, &mtot);
  std::cout << "free memory: " << mfree << std::endl;
  r = cufftGetSize1d(p, nx, CUFFT_C2C, ny, &ws);
  std::cout << " r = " << (int)r << std::endl;
  std::cout << "ws = " << ws << std::endl;
  //sleep(32);
}
$ nvcc -o t2248 t2248.cu -lcufft -DUSE_MY_METHOD
$ ./t2248
ws = 34398535680
free memory: 32679395328
free memory: 133693440
 r = 0
ws = 256
$

I probably wouldn’t be able to comment further without a complete test case. That test case cannot be your whole code. It needs to be crafted as a directed test, like my example above, that demonstrates the issue.

sgw136 · April 12, 2023, 11:57am

You are correct. I did not notice that subtle difference, nor did I know about the difference between cufftPlan1d and cufftMakePlan1d. This improved the design of my FFT wrapper, and there is no need to call cufftGetSize1d now. I am guessing this will have a speedup as well since those extra allocations will no longer be happening in the plan generation. Thanks for the assistance!

Topic		Replies	Views
Large data size for cuFFT GPU-Accelerated Libraries	8	3902	September 8, 2018
cufftPlan2d fails CUDA Programming and Performance	14	21020	September 17, 2007
cufftEstimate*() memory consumption GPU-Accelerated Libraries	3	466	June 29, 2019
CUFFT_INTERNAL_ERROR during creation of a 1D Plan in CUFFT GPU-Accelerated Libraries cuda , cufft	11	3784	October 19, 2022
Multiple batches of 1D FFT using cuFFT GPU-Accelerated Libraries	10	5065	October 29, 2019
allocation problem in cuFFT CUDA Programming and Performance	2	2555	September 16, 2009
Memory needed for CUDA 1D FFT plan creation - or how to make saussage with CUDA hacks CUDA Programming and Performance	8	12255	February 9, 2011
Internal details/limitations of cuFFT, general questions GPU-Accelerated Libraries	2	592	July 19, 2018
cufftXt batch 1D GPU-Accelerated Libraries	12	2170	October 15, 2019
Documentation of cufftGetSize*() CUDA Programming and Performance	0	963	January 2, 2014

cufftGetSize1d fails with a CUFFT_ALLOC_FAILED error

Related topics