cufftXtMakePlanMany fp16 data size limiation

Hi,

CUDA version : 13.1

I used cufftXtMakePlanMany as below shown:

long long n\[1\] = {N};

cufftErr = cufftXtMakePlanMany(plan, 1, n,

                                nullptr, 1, 1, CUDA_C_16F,   // input:  FP16 complex

                                nullptr, 1, 1, CUDA_C_16F,   // output: FP16 complex

                                1, &workSize, CUDA_C_16F);    // batch=1, execution type FP16

when N is 48, there will be CUFFT_NOT_SUPPORTED = 16 error.

but when N is 247, it will run normally.

when N is a power of 2, it will run normally.

I also found below info from nvida link :
ref 1->

1.3.1. Half-precision cuFFT Transforms
Half-precision transforms have the following limitations:
? Sizes are restricted to powers of two only.

ref 2->

3.3.8. cufftXtMakePlanMany()

For multiple GPUs and rank equal to 1, the sizes must be a power of 2.

I don’t know what’s the acurate limiation for this api.

Anyone can explain it.

That would be very thankful.