CUFFT_INVALID_DEVICE when creating cufft plan on HPC

james.schloss · February 27, 2018, 2:19am

I am testing the following code on my own local machines (both on Archlinux and on Ubuntu 16.04 using nvidia driver 390 and cuda 9.1) and on our local HPC clusters:

#include <iostream>
#include <cufft.h>

int main(){
    // Initializing variables
    int n = 1024;
    cufftHandle plan1d;
    double2 *h_a, *d_a;

    // Allocation / definitions
    h_a = (double2 *)malloc(sizeof(double2)*n);
    for (int i = 0; i < n; ++i){
        h_a[i].x = sin(2*M_PI*i/n);
        h_a[i].y = 0;
    }

    cudaMalloc(&d_a, sizeof(double2)*n);
    cudaMemcpy(d_a, h_a, sizeof(double2)*n, cudaMemcpyHostToDevice);
    cufftResult result = cufftPlan1d(&plan1d, n, CUFFT_Z2Z, 1);

    // ignoring full error checking for readability
    if (result == CUFFT_INVALID_DEVICE){
        std::cout << "Invalid Device Error\n";
        exit(1);
    }

    // Executing FFT
    cufftExecZ2Z(plan1d, d_a, d_a, CUFFT_FORWARD);

    //Executing the iFFT
    cufftExecZ2Z(plan1d, d_a, d_a, CUFFT_INVERSE);

    // Copying back
    cudaMemcpy(h_a, d_a, sizeof(double2)*n, cudaMemcpyDeviceToHost);

}

I compile with nvcc cuda_test.cu -lcufft

On both of my local machines, the code works just fine; however, I have tried using the same code on our HPC clusters and it will return the CUFFT_INVALID_DEVICE error on that hardware / configuration. Here’s the hardware and driver configuration for those devices.

For one cluster, we have several P100's available and are using nvidia driver version 384.90 with cuda version 8.0.61.
On the second cluster, we are using K80's with nvidia driver version 367.44 and cuda version 8.0.44. As a note, when running the code with cuda version 7.5.18 on this hardware, the above code will still return an error, but this will not actually affect the execution of the code (so far as I am aware).

According to this, the cuda versions should be fine with the driver versions available; however, I receive a similar error when I had my drivers and cuda installations incorrect on my local ubuntu machine before.

I am completely baffled at how to continue here and can only think of a few things:

There is some difference between the consumer hardware I am using on my local machines (Titan X, pascal and GTX 970) and the cluster HPC hardware.
There is some driver configuration problem that I have not considered. I did what I could to try out different cuda versions, but none of them seemed to work, except for 7.5.18, which returned the same error, but did not seem to affect performance.
There is some change to cufft after cuda 7.5.18 that I was not made aware of.

As a note: this is just an example, but I have a larger codebase that does not seem to run due to this error and I am trying to figure out how to solve that issue currently.

Thanks for reading and let me know if you have any ideas on how to proceed!

Topic		Replies	Views
cufftPlan2d returns CUFFT_INVALID_DEVICE CUDA Programming and Performance	2	588	October 8, 2019
CUFFT_INVALID_PLAN error Error on using CUFFT CUDA Programming and Performance	10	3668	June 29, 2009
CUFFT_INVALID_DEVICE on cufftPlan1d in NVIDIA's Simple CUFFT example GPU-Accelerated Libraries	6	3907	December 15, 2014
CUDA Noob: "cufft: ERROR: CUFFT_INVALID_PLAN" CUDA Programming and Performance	4	4473	December 5, 2008
cufft : ERROR: CUFFT_INVALID_PLAN CUDA Programming and Performance	1	3106	July 13, 2008
Cufft 1D can't create plan GPU-Accelerated Libraries cufft	3	685	December 19, 2023
CUFFT Problems with CUFFT CUDA Programming and Performance	1	8234	March 29, 2011
CUFFT_INTERNAL_ERROR while running cufftPlan1d GPU-Accelerated Libraries	9	9441	January 5, 2021
`CUFFT_INTERNAL_ERROR` when using `cufftPlan` with 1d or 2d in any size GPU-Accelerated Libraries cufft	6	374	July 14, 2024
running cuFFTXT library on two GPUs GPU-Accelerated Libraries	0	866	December 26, 2014

CUFFT_INVALID_DEVICE when creating cufft plan on HPC

Related topics