When I create a plan with nx = 75300000 and CUFFT_C2C, the function cufftPlan1d returns CUFFT_ALLOC_FAILED.

But I retry with nx = 75600000, it returns CUFFT_SUCCESS.

So I test a loop from 75300000 step by step until it returns CUFFT_SUCCESS, it stops with 75300225.

Is there some limited conditions when creating plan with any nx?

I do not find any topics in help document,thanks a lot!

cpu intel xeon e5 2640 v3 2.6GHz *2

memory 16GB

gpu tesla k80

cuda version 7.5

75300000 = 2^5 x 3^1 x 5^5 x 251^1

75600000 = 2^7 x 3^3 x 5^5 x 7^1

cufft needs much more memory for “arbitrary” sizes.

“Algorithms highly optimized for input sizes that can be written in the form 2^a × 3^b × 5^c × 7^d. In general the smaller the prime factor, the better the performance, i.e., powers of two are fastest.”

http://docs.nvidia.com/cuda/cufft/#introduction

For the calculation of primes I’ve used:

http://www.calculatorsoup.com/calculators/math/prime-factors.php

Best Regards

Thanks your help!

I notice the cufftEstimate1d function can give the work space size.

I test nx = 75300000, it still returns CUFFT_ALLOC_FAILED.

Comparing some nx as follow:

nx = 75600000 workSize = 600MB

nx = 75600 workSize = 600KB

nx = 75300 workSize = 2.4MB

I guess when nx = 75300000 workSize = 2.4GB, while k80 has 12GB memory on GPU.

If the needed resource on GPU is less than the total momery on GPU, why does not it work well?

OR like the limited thread per block, the work space size has a maximum too.

But the document does not present.

I didn’t have any trouble with the following:

```
$ cat t1230.cu
#include <cufft.h>
#include <stdio.h>
int main(){
size_t nx = 75300;
cufftHandle p;
cufftResult r;
size_t ws;
for (size_t i = 1; i < 10000; i*=10){
r = cufftEstimate1d((nx*i), CUFFT_C2C, 1, &ws);
printf("nx: %lu, ws: %lu\n", (nx*i), ws);
}
r = cufftPlan1d(&p, nx*1000, CUFFT_C2C, 1);
printf("cufft status: %d\n", (int)r);
return 0;
}
$ nvcc -o t1230 t1230.cu -lcufft
$ ./t1230
nx: 75300, ws: 2430976
nx: 753000, ws: 24301568
nx: 7530000, ws: 241864704
nx: 75300000, ws: 2415919104
cufft status: 0
$
```

I was using CUDA 8RC, RHEL 7, Tesla K20X (6GB) for this test.