cuFFT problem when FFT point number > 2000

Platform: NVidia Jetson Nano 8GB with JetPack 5.1.1.

My application needs to calculate FFT transform (R2C) with cuFFT.
Here are the critical code snippets:

* 1D FFT, batch_size = 2, nfft = 2000
const int rank = 1; // 1-dimension
int n[rank] = {nfft};
int inembed[rank] = {nfft};
int istride = 1;
int idist = nfft;
int onembed[rank] = {nfft};
int ostride = 1;
int odist = nfft;
size_t worksize[1];
cufftResult res = cufftMakePlanMany(plan, rank, n, inembed, istride, idist, onembed, ostride, odist, CUFFT_R2C, batch_size, worksize);

/malloc device memory and copy input data from host to GPU/

res = cufftExecR2C(plan, d_input, d_output);

/*copy output data from GPU to host */
cudaMemcpyAsync(, d_output, sizeof(output_type) * n_output.size(), cudaMemcpyDeviceToHost, stream);

According to cuFFT manual:
“Finally, R2C demands an input array (X1,X2,…,XN) of real values and returns an array (x1,x2,…,x⌊N/2⌋+1) of non-redundant complex elements.”

In my application, the input array size is 2000, and the output array size is also 2000. So I would expect the elements from ouput[N/2 + 2] to output[N] to be zero.
Observation on the output data shows that this assumption is correct, if ‘nfft’, the FFT point number is not larger than 2000.

However, if ‘nfft’ is larger, e.g, 3000, the elements from ouput[1501] to output[4501] are not zero, only elements from output[4502] to output[5999] are zeros.

Is there any limits with cuFFT on the input data size?
Or anything wrong with my cufftMakePlanMany() parameter setting?