I’m trying to perform cuFFT 2D on 2D array of type __half2.
I am doing so by using cufftXtMakePlanMany and cufftXtExec, but I am getting “inf” and “nan” values - so something is wrong.
The 2D array is data of Radar with Nsamples x Nchirps.
Below is my configuration for the cuFFT plan and execution.
cufftHandle plan;
cufftCreate(&plan);
int rank = 2;
int batch = 1;
size_t ws = 0;
long long size_arr[rank] = {Nsamples, Nchirps};
long long int idist = Nsamples;
long long int odist = Nsamples;
int istride = 1;
int ostride = 1;
if(cufftXtMakePlanMany(plan, rank, size_arr, NULL, istride, idist, CUDA_C_16F, NULL, ostride, odist, CUDA_C_16F, batch, &ws, CUDA_C_16F) != CUFFT_SUCCESS)
{
printf("cufftXtMakePlanMany Error\n");
}
if(cufftXtExec(plan, devInData, devOutData, CUFFT_FORWARD) != CUFFT_SUCCESS)
{
printf("cufftXtExec 1 Error\n");
}
cudaDeviceSynchronize();
Half precision transforms might not be suitable for all kinds of problems due to limited range represented by half precision floating point arithmetics. Please note that the first element of FFT result is the sum of all input elements and it is likely to overflow for certain inputs.
If a similar transform setup is working for you in the 32-bit case, then this may be an issue.
As you can also probably now imagine, your report may have a data dependency. Therefore, debugging your case may require the actual data, not just the code you have shown.
Thank you for the quick and detailed response.
I have moved to the cufftPlan2D APIs and using now FP32.
Now it is working, so it might have been the precision issue.
In any case the, the cufftPlan2D FP32 is faster then the cufftXtMakePlanMany FP16 - so I’ll be using that.