cuFFT problem with ORIN NANO

Platform: NVidia Jetson Nano 8GB with JetPack 5.1.1.

My application needs to calculate FFT transform (R2C) with cuFFT.

Here are the critical code snippets:

* 1D FFT, batch_size = 2, nfft = 2000
const int rank = 1; // 1-dimension
int n[rank] = {nfft};
int inembed[rank] = {nfft};
int istride = 1;
int idist = nfft;
int onembed[rank] = {nfft};
int ostride = 1;
int odist = nfft;
size_t worksize[1];
cufftResult res = cufftMakePlanMany(plan, rank, n, inembed, istride, idist, onembed, ostride, odist, CUFFT_R2C, batch_size, worksize);

/malloc device memory and copy input data from host to GPU/

res = cufftExecR2C(plan, d_input, d_output);

/*copy output data from GPU to host */
cudaMemcpyAsync(, d_output, sizeof(output_type) * n_output.size(), cudaMemcpyDeviceToHost, stream);

According to cuFFT manual:
“Finally, R2C demands an input array (X1,X2,…,XN) of real values and returns an array (x1,x2,…,x⌊N/2⌋+1) of non-redundant complex elements.”
In my application, the input array size is 2000, and the output array size is also 2000. So I would expect the elements from ouput[N/2 + 2] to output[N] to be zero.
Observation on the output data shows that this assumption is correct, if ‘nfft’, the FFT point number is not larger than 2000.

However, if ‘nfft’ is larger, e.g, 3000, the elements from ouput[1501] to output[4501] are not zero, only elements from output[4502] to output[5999] are zeros.

Is there any limits with cuFFT on the input data size?
Or anything wrong with my cufftMakePlanMany() parameter setting?


Is the result in [x1, …, x[N/2]+1] correct?

Yes, results in [X1, …, X[N/2 +1] are correct.

The difference is that:

  1. if batch_size = 1, data from X[N/2+2] to X[N-1] are zeros as expected, with different nfft, e.g, nfft=8, 2000, 3000, 5000.
  2. if batch_size = 2, data from X[N/2+2] to X[N-1] ( of each batch) are NOT zeros as expected , with large nfft, e.g, nfft = 3000, 5000.

Enclosed is the test program.
You can check the FFT transform results with different configuration by simply changing:

int cnt = 3000;  // nfft 
int batch_size = 1; //batch_size

1d_r2c_example_mplan.cpp (5.2 KB)
Makefile (2.8 KB)

I did some tests with library FFTW,
using similar settings ( nfft= 3000, batch= 3).
For each batch, X[N/2 + 2] to X[N] are zeros ,
since FFTW library only outputs non-redundant results(i.e, X[1], …, X[N/2 +1]).

Enclosed are the source code and output log file.
Anything wrong with cuFFT library?
Makefile (205 Bytes)

fftw_test.cpp (1.1 KB)
log (214.0 KB)


It doesn’t sound like an issue since the output data is correct.
The remaining buffer data value is not guaranteed so it might contain some value if the buffer is reused.


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.