Good morning, all.
I am working on a dataset that needs FT and its dimensions are:
- Transform length: 1252
- Batch: 210945
If we try these numbers in this simple program, we get a cuFFT error 2 (bad allocation), even though cudaMalloc returns 0:
#include <cufft.h>
#include <iostream>
using namespace std;
int main(void)
{
int NX = 1252,
BATCH = 210945;
cufftHandle c2c_handle;
cufftComplex *dev_complex;
cout << "cuFFT return value: " << cufftPlan1d(&c2c_handle, NX, CUFFT_C2C, BATCH) << endl;
cout << "CUDA return value: " << cudaMalloc((void **) &dev_complex, NX * BATCH * sizeof(cufftComplex)) << endl;
cudaFree(dev_complex);
return 0;
}
If we try 1252 and 160945, that is, 50000 less in batch, it works.
4096 samples and 65536 batch works (is larger than 1252 by 210945).
4096 samples and 98304 batch also works (also larger).
Because it works with larger sizes, my guess is that cuFFT is not liking 1252 x 210945. Reading this post:
https://devtalk.nvidia.com/default/topic/1026698/gpu-accelerated-libraries/large-data-size-for-cufft/
It is suggested that the dimensions should be a power of 2, 3, 5 or 7… for optimal performance, not as a requirement. I also couldn’t find any explicit limitation mentioned in the documentation, and the 64M/128M conclusion of the OP probably comes from:
https://devtalk.nvidia.com/default/topic/520201/cufft-size/
https://stackoverflow.com/questions/13187443/nvidia-cufft-limit-on-sizes-and-batches-for-fft-with-scikits-cuda
Which doesn’t seem to be the case anymore due to the larger allocations passing fine.
Do you guys know what is happening?
System is:
CUDA 9.1, Ubuntu 16.04, GTX 1080Ti, driver 390.81
CUDA 9.1, RHEL 6.10, GRID P40-8Q, driver 390.75