Hi,
CUDA version : 13.1
I used cufftXtMakePlanMany as below shown:
long long n\[1\] = {N};
cufftErr = cufftXtMakePlanMany(plan, 1, n,
nullptr, 1, 1, CUDA_C_16F, // input: FP16 complex
nullptr, 1, 1, CUDA_C_16F, // output: FP16 complex
1, &workSize, CUDA_C_16F); // batch=1, execution type FP16
when N is 48, there will be CUFFT_NOT_SUPPORTED = 16 error.
but when N is 247, it will run normally.
when N is a power of 2, it will run normally.
I also found below info from nvida link :
ref 1->
1.3.1. Half-precision cuFFT Transforms
Half-precision transforms have the following limitations:
? Sizes are restricted to powers of two only.
ref 2->
3.3.8. cufftXtMakePlanMany()
For multiple GPUs and rank equal to 1, the sizes must be a power of 2.
I don’t know what’s the acurate limiation for this api.
Anyone can explain it.
That would be very thankful.