CUDA Noob: "cufft: ERROR: CUFFT_INVALID_PLAN"

Hi all

I’m trying to help some geoscientists trial trial CUDA from fortran. When I run the attached code (called from fortran) I get these error messages:

In function ‘cu_fft_1d_r2c_’

  • plan created
  • about to execute plan
    cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.0/cufft/src/cufft.cu, line 115
    cufft: ERROR: CUFFT_INVALID_PLAN
  • plan executed
    cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.0/cufft/src/cufft.cu, line 94
    cufft: ERROR: CUFFT_INVALID_PLAN
    Leaving function ‘cu_fft_1d_r2c_’
    amplitude of bp filter at 0.5 transmitter base= (0.9961086,0.000000)
    In function ‘cu_fft_1d_c2r_’
  • about to execute plan
  • plan created
    cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.0/cufft/src/cufft.cu, line 115
    cufft: ERROR: CUFFT_INVALID_PLAN
  • plan executed
    cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.0/cufft/src/cufft.cu, line 94
    cufft: ERROR: CUFFT_INVALID_PLAN
    Leaving function ‘cu_fft_1d_c2r_’

I understand the ‘cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.0/cufft/src/cufft.cu’ part, my code does not live in this directory so I figure thats where the NVidia coder was working.

The ‘line 115’ error occurs when executing the plan.

The ‘line 94’ error occurs when destroying the plan.

I’ve read the CUFFT Library 2.0 PDF but I can’t spot what I’m doing wrong with the plan.

in the fortran code, I call my functions by ‘call cu_fft_1d_r2c(pt_series,dt_series,n)’ and ‘call cu_fft_1d_c2r(pt_series,dt_series,n)’.

I have two quadro FX5600’s in my workstation (SLI bridge attached but SLI not configure in X - I’m on a RedHat EL5.2 workstation).

Can someone please assist?

Thanks in advance.

CC
cudafunction.cu.txt (2.96 KB)

You are mixing host memory and GPU memory.
The code should do something like:

  1. allocate GPU memory
  2. copy from CPU to GPU
  3. call cuFFT ( input and output should be arrays in GPU memory)
  4. copy result from GPU to CPU
  5. free GPU memory

Ahh, my problem is/was that the transform size was a little of 18,000,000. The CUFFT Library doco states that “1D transform sizes up to 8 million elements”. When I hardcoded NX to be 7999999, the code suddenly ran without errors.

So my question now is, how can I overcome this problem? Could someone point me at an example?

Thanks

CC

Cool, thx for the tip.

I’m now doing:

cudaMalloc((void**)&out, sizeof(cufftComplex)*NX);
cudaMemcpy(in, out, sizeof(cufftComplex)*NX, cudaMemcpyHostToDevice);

But now I’m getting ‘cudaErrorInvalidDevicePointer’ when cudaMalloc is called.

CC
cudafunction.cu.txt (3.22 KB)

No surprise, the destination is the first argument for memcpy, not the other way round as you try to do it.