Check for a valid cufft plan?

While writing cufft mex functions for Matlab, I’ve found that gpuDevice(1) deletes all stored cufft plans.

I have a static tracking vector to avoid creating new plans when an old one has been created, but it seems there is no way to check whether an old plan has been deleted by gpuDevice(1); when the plan has been deleted, cufft calls just hang.

Is there, or can we get, a cufft function call which checks plan valid by handle?

I think cufftGetSize:

[url]cuFFT :: CUDA Toolkit Documentation

should be a fairly innocuous form of checking if the handle is valid.

I believe it should return:

CUFFT_INVALID_PLAN

if the handle is invalid. I tested this just now with invalid plan handles, it seems to work correctly (i.e. return CUFFT_INVALID_PLAN).

Thanks for the suggestion txbob. I will try that.

I also found that if a plan has been deleted by gpuDevice(1), cufftDestroy on that plan will prevent it from being used after being re-created, so I guess I need to check validity before destroying the plan as well.

Apparently plans deleted by gpuDevice(1) do not return CUFFT_INVALID_PLAN from cufftGetSize() :(

Work around is to create a mex function which deletes plans from the tracking vector (w/o cufftDestroy) whenever gpuDevice(1) is called.

Would be nice to have a real validity checker…

I took a closer look at this.

The problem is that gpuDevice(), when specified with a parameter, calls cudaDeviceReset():

[url]http://www.mathworks.com/help/distcomp/gpudevice.html[/url]

cudaDeviceReset() will wipe out any device memory allocations, but does not necessarily clear all state from cufft. The plan may still be valid, but if you attempt to use that plan (presumably created before the device reset) with device variables (presumably created before the device reset) you will run into trouble, as those variables no longer exist.

I have developed a simple proof code to show that a plan can be still valid and usable after a cudaDeviceReset(), as long as the necessary device variables are re-created.

The reason cufftGetSize() is not returning CUFFT_INVALID_PLAN is because the plan is still valid.

The reason your code hangs after calling gpuDevice(1) is not related to invalid plans, but the loss of device variable state, which must be re-created after a cudaDeviceReset().

In short, don’t do this.

gpuDevice(1) resets the device. At that point, you must assume everything about it is invalid. However your plan may still be valid, and you cannot judge the sanity of moving forward with it, if the underlying device data is compromised.

Stated another way, I don’t believe this statement is accurate:

“I’ve found that gpuDevice(1) deletes all stored cufft plans.”

OK, it appears gpuDevice(1) makes existing plans unusable.

How did you make the existing plan usable after cudaDeviceReset/gpuDevice(1)?

Here’s a worked example:

$ cat t515.cu
#include <complex>
#include <iostream>
#include <cufft.h>
#include <cuda_runtime_api.h>

#define cudaCheckErrors(msg) \
    do { \
        cudaError_t __err = cudaGetLastError(); \
        if (__err != cudaSuccess) { \
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                msg, cudaGetErrorString(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)


typedef std::complex<double> Complex;
using namespace std;

int main(){
  int n = 10;
  double* in;
  Complex* out;
#ifdef IN_PLACE
  in = (double*) malloc(sizeof(Complex) * (n/2+1));
  out = (Complex*)in;
#else
  in = (double*) malloc(sizeof(double) * n);
  out = (Complex*) malloc(sizeof(Complex) * (n/2+1));
#endif
  for(int i=0; i<n; i++){
     in[i] = (i>1)?1:0;
  }

  cufftHandle plan;
  cufftResult res = cufftPlan1d(&plan, n, CUFFT_D2Z, 1);
  if (res != CUFFT_SUCCESS)  {cout << "cufft plan error: " << res << endl; return 1;}
  cufftDoubleReal *d_in;
  cufftDoubleComplex *d_out;
  unsigned int out_mem_size = (n/2 + 1)*sizeof(cufftDoubleComplex);
#ifdef IN_PLACE
  unsigned int in_mem_size = out_mem_size;
  cudaMalloc((void **)&d_in, in_mem_size);
  d_out = (cufftDoubleComplex *)d_in;
#else
  unsigned int in_mem_size = sizeof(cufftDoubleReal)*n;
  cudaMalloc((void **)&d_in, in_mem_size);
  cudaMalloc((void **)&d_out, out_mem_size);
#endif
  cudaCheckErrors("cuda malloc fail");
  cudaMemcpy(d_in, in, in_mem_size, cudaMemcpyHostToDevice);
  cudaCheckErrors("cuda memcpy H2D fail");
  res = cufftExecD2Z(plan,d_in, d_out);
  if (res != CUFFT_SUCCESS)  {cout << "cufft exec error: " << res << endl; return 1;}
  cudaMemcpy(out, d_out, out_mem_size, cudaMemcpyDeviceToHost);
  cudaCheckErrors("cuda memcpy D2H fail");

  for(int i=0; i<n/2; i++)
     cout << "out: " << i << " "  << out[i].real() << " " <<  out[i].imag() << endl;
  cout << "Device Reset!" << endl;
  cudaDeviceReset();
  memset(out, 0, out_mem_size);
#ifdef IN_PLACE
  cudaMalloc((void **)&d_in, in_mem_size);
#else
  cudaMalloc((void **)&d_in, in_mem_size);
  cudaMalloc((void **)&d_out, out_mem_size);
#endif
  cudaCheckErrors("cuda malloc fail");
  cudaMemcpy(d_in, in, in_mem_size, cudaMemcpyHostToDevice);
  cudaCheckErrors("cuda memcpy H2D fail");
  res = cufftExecD2Z(plan,d_in, d_out);
  if (res != CUFFT_SUCCESS)  {cout << "cufft exec error: " << res << endl; return 1;}
  cudaMemcpy(out, d_out, out_mem_size, cudaMemcpyDeviceToHost);
  cudaCheckErrors("cuda memcpy D2H fail");

  for(int i=0; i<n/2; i++)
     cout << "out: " << i << " "  << out[i].real() << " " <<  out[i].imag() << endl;
  return 0;
}
$ nvcc -arch=sm_20 -o t515 t515.cu -lcufft
$ ./t515
out: 0 8 0
out: 1 -1.80902 0.587785
out: 2 -1.30902 0.951057
out: 3 -0.690983 0.951057
out: 4 -0.190983 0.587785
Device Reset!
out: 0 8 0
out: 1 -1.80902 0.587785
out: 2 -1.30902 0.951057
out: 3 -0.690983 0.951057
out: 4 -0.190983 0.587785

Thanks for posting that, but I don’t see how you’re recovering the plan.

I am doing something similar with gpuDevice(1) in Matlab, and if I don’t create a new plan after resetting, the cufftExec call just hangs.

I’m not recovering the plan. The plan is still valid. The proof of that is

  1. if you were to test the plan, it passes the test (we’ve covered that already.)
  2. when I re-use the existing plan, with newly created device data, it works.

I’m sorry you’re not grasping this, but I doubt that you are re-creating all your device data correctly after the gpuDevice(1) call. That is the reason you are seeing the hang. It’s not due to a bad plan, it’s due to bogus underlying data.

Perhaps you should use some other method that works for you, since you’re not grasping the idea that the device data is no longer valid, and must be re-created. That is exactly what the sample code I provided does. It does not re-create the plan, because the plan is still valid.

I am doing something else which works. I clear the plan tracking vector which forces it to create a new plan.

The data is being reproduced on the device with a gpuArray(data) Matlab call, exactly the same as before the reset, whether I create a new plan or not.

No new plan created: device hangs.

New plan created: it works.

What I’m not grasping is, why it only works with a new plan if the existing plan is valid.
And why destroying the existing plan after a reset, causes a new plan creation to fail.

Unless Matlab treats the data as still valid after resetting and does not recopy it to the device…

But that does not explain why it works when a new plan is created.

In any case, it appears you’ve demonstrated the problem is with the way Matlab handles the reset, and not with CUDA.