CUFFT_INVALID_DEVICE on cufftPlan1d in NVIDIA's Simple CUFFT example

Sorry. I tried to post under jeffguy@gmail.com, since that email address is more reliable for me. I’ve included my post below.

Subject: CUFFT_INVALID_DEVICE on cufftPlan1d in NVIDIA’s Simple CUFFT example

Body:
I went to CUDA Samples :: CUDA Toolkit Documentation and downloaded “Simple CUFFT”, which I’m trying to get working.

I’m using Ubuntu 14.04, and installed the driver and stuff by adding Index of /compute/cuda/repos/ubuntu1404/x86_64 to my apt sources.

I ran a basic program that copies the 16 numbers from the fibonacci sequence from one host array to the GPU, then back to another host array, and verified that the result is right. I think this means my CUDA environment is set up properly. Then I tried the cufft sample code I downloaded.

$ cd 7_CUDALibraries/simpleCUFFT
$ make
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../common/inc  -m64     -gencode arch=compute_11,code=sm_11 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_50,code=compute_50 -o simpleCUFFT.o -c simpleCUFFT.cu
nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release.
../../common/inc/helper_cuda.h(253): error: identifier "cudaErrorHardwareStackError" is undefined

../../common/inc/helper_cuda.h(256): error: identifier "cudaErrorIllegalInstruction" is undefined

../../common/inc/helper_cuda.h(259): error: identifier "cudaErrorMisalignedAddress" is undefined

../../common/inc/helper_cuda.h(262): error: identifier "cudaErrorInvalidAddressSpace" is undefined

../../common/inc/helper_cuda.h(265): error: identifier "cudaErrorInvalidPc" is undefined

../../common/inc/helper_cuda.h(268): error: identifier "cudaErrorIllegalAddress" is undefined

../../common/inc/helper_cuda.h(272): error: identifier "cudaErrorInvalidPtx" is undefined

../../common/inc/helper_cuda.h(275): error: identifier "cudaErrorInvalidGraphicsContext" is undefined

8 errors detected in the compilation of "/tmp/tmpxft_00000c10_00000000-21_simpleCUFFT.compute_50.cpp1.ii".
make: *** [simpleCUFFT.o] Error 2

Uh oh. Those symbols are defined in my /usr/local/cuda/include/driver_types.h. They’re also mentioned in some libraries:

$ grep -R cudaErrorHardwareStackError /usr/local/cuda
Binary file /usr/local/cuda/lib64/libcufftw.so.6.5.14 matches
Binary file /usr/local/cuda/lib64/libcufftw.so matches
Binary file /usr/local/cuda/lib64/libcufft.so.6.5 matches
Binary file /usr/local/cuda/lib64/libcufft.so matches
Binary file /usr/local/cuda/lib64/libcufft.so.6.5.14 matches
Binary file /usr/local/cuda/lib64/libcufftw.so.6.5 matches
Binary file /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so.6.5.14 matches
Binary file /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so matches
Binary file /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.6.5 matches
Binary file /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so matches
Binary file /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.6.5.14 matches
Binary file /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so.6.5 matches

but don’t appear to be symbols defined by those libraries:

$ readelf -a /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so|grep -i cudaErrorHardwareStackError
(No output)

I found nothing by searching around, so I just commented out the relevant lines from common/inc/helper_cuda.h since they just seem to affect error handling, and if such an error is produced, I should at least see “”.

$ make
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../common/inc  -m64     -gencode arch=compute_11,code=sm_11 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_50,code=compute_50 -o simpleCUFFT.o -c simpleCUFFT.cu
nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release.
/usr/local/cuda/bin/nvcc -ccbin g++   -m64       -gencode arch=compute_11,code=sm_11 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_50,code=compute_50 -o simpleCUFFT simpleCUFFT.o  -lcufft
nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release.
mkdir -p ../../bin/x86_64/linux/release
cp simpleCUFFT ../../bin/x86_64/linux/release

Excellent! Successful build!

$ ./simpleCUFFT
[simpleCUFFT] is starting...
GPU Device 0: "Quadro FX 880M" with compute capability 1.2

CUDA error at simpleCUFFT.cu:120 code=11(CUFFT_INVALID_DEVICE) "cufftPlan1d(&plan, new_size, CUFFT_C2C, 1)"

Ouch! I did some googling and couldn’t understand why I’m getting this error. I get the same CUFFT_INVALID_DEVICE from other sample cufft programs that I found elsewhere, though.

I’ve dumped about 6 hours into this problem, and am out of ideas. I probably should have turned to you guys sooner. I’ll be very grateful for any help you can give.

My laptop is an HP EliteBook 8540w.

Here’s some other system info:

$ uname -a
Linux jguy-EliteBook-8540w 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ lspci|grep NV
01:00.0 VGA compatible controller: NVIDIA Corporation GT216GLM [Quadro FX 880M] (rev a2)
01:00.1 Audio device: NVIDIA Corporation GT216 HDMI Audio Controller (rev a1)

$ lsmod|grep nv
nvidia              10675249  41
drm                   302817  2 nvidia

$ /usr/local/cuda-6.5/bin/nvcc --versionnvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12

$ dpkg -l|egrep 'cuda|nvidia'
ii  cuda-core-6-5                                         6.5-14                                              amd64        CUDA core tools
ii  cuda-cufft-6-5                                        6.5-14                                              amd64        CUFFT native runtime libraries
ii  cuda-cufft-dev-6-5                                    6.5-14                                              amd64        CUFFT native dev links, headers
ii  cuda-license-6-5                                      6.5-14                                              amd64        CUDA licenses
ii  cuda-misc-headers-6-5                                 6.5-14                                              amd64        CUDA misc headers
ii  cuda-repo-ubuntu1404                                  6.5-14                                              amd64        CUDA repo configuration files.
ii  libcuda1-331                                          331.38-0ubuntu7.1                                   amd64        NVIDIA CUDA runtime library
ii  libcudart5.5:amd64                                    5.5.22-3ubuntu1                                     amd64        NVIDIA CUDA runtime library
ii  nvidia-331                                            331.38-0ubuntu7.1                                   amd64        NVIDIA binary driver - version 331.38
ii  nvidia-cuda-dev                                       5.5.22-3ubuntu1                                     amd64        NVIDIA CUDA development files
ii  nvidia-libopencl1-331                                 331.38-0ubuntu7.1                                   amd64        NVIDIA OpenCL Driver and ICD Loader library
ii  nvidia-opencl-icd-331                                 331.38-0ubuntu7.1                                   amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                                          0.6.2                                               amd64        Tools to enable NVIDIA's Prime
ii  nvidia-settings                                       331.20-0ubuntu8                                     amd64        Tool for configuring the NVIDIA graphics driver

Your CUDA install is broken somehow. The initial trouble report, e.g. :

../../common/inc/helper_cuda.h(253): error: identifier "cudaErrorHardwareStackError" is undefined

and etc. is being issued by a compile (only) command (indicated by -c):

/usr/local/cuda/bin/nvcc -ccbin g++ -I../../common/inc  ... -o simpleCUFFT.o -c simpleCUFFT.cu

That means the problem would not occur if the proper driver_types.h were being included. Since nvcc does this automatically (normally, controlled partly by the nvcc.profile file), it’s not at all obvious what the problem might be. But it has nothing to do with libraries, or linking, at this stage of the trouble report.

Rather than going on to delete stuff and say “it compiled!” and then try and figure out why it is not working, I would start with the initial trouble report.

I would start with paths, the directory relationships, and so forth. Did you attempt to build all the samples already by issuing make in the toplevel samples directory?

When you did the grep, did it list driver_types.h ? (you didn’t include it in your output)

You might also try installing CUDA using the runfile method, rather than the repo method:

http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#runfile

Thanks, txbob. Your suggestion was correct. I found lots of filenames in common between /usr/include and /targets/x86_64-linux/include. I moved all the duplicates from /usr/include into a backup folder, reverted to NVIDIA’s original Simple CUFFT example, and it built successfully.

$ make
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../common/inc  -m64     -gencode arch=compute_11,code=sm_11 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_50,code=compute_50 -o simpleCUFFT.o -c simpleCUFFT.cu
nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release.
/usr/local/cuda/bin/nvcc -ccbin g++   -m64       -gencode arch=compute_11,code=sm_11 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_50,code=compute_50 -o simpleCUFFT simpleCUFFT.o  -lcufft
nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release.
mkdir -p ../../bin/x86_64/linux/release
cp simpleCUFFT ../../bin/x86_64/linux/release

$ ./simpleCUFFT 
[simpleCUFFT] is starting...
GPU Device 0: "Quadro FX 880M" with compute capability 1.2

CUDA error at simpleCUFFT.cu:120 code=11(CUFFT_INVALID_DEVICE) "cufftPlan1d(&plan, new_size, CUFFT_C2C, 1)" 
$ # driver version was 331.38 or so.
$ # sudo apt-get update
$ # sudo apt-get upgrade
$ # reboot
$ # now, driver version is 331.113
$ ./simpleCUFFT 
[simpleCUFFT] is starting...
CUDA error at ../../common/inc/helper_cuda.h:1032 code=30(cudaErrorUnknown) "cudaGetDeviceCount(&device_count)"

To solve the linking or runtime errors, I looked for lib*.so in /usr/local/cuda-6.5, and found only libnvvm.so, libcufftw.so, and libcufft.so. These files also appear in /usr/lib/x86_64-linux-gnu, so I moved those files (and libnvvm.so.5.5.22, etc.) to a backup folder and recompiled, but saw the same error at runtime.

Perhaps I didn’t completely uninstall the nvidia driver, cuda, or cufft? To uninstall, CUDA Toolkit Documentation mentions nvidia-uninstall and uninstall_cuda_6.5.pl, but I couldn’t find either on my system. Is there a good way to be sure I’ve removed all old nvidia and cuda files from my system?

I see that the latest NVIDIA driver available for download is 340.65, whereas the one in the repos is 331.113. But since cufft from the repos has a dependency on the nvidia driver in the ubuntu repo, I would expect they’d be compatible.

$ nvidia-smi 
Fri Dec 12 17:04:21 2014       
+------------------------------------------------------+                       
| NVIDIA-SMI 331.113    Driver Version: 331.113        |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro FX 880M      Off  | 0000:01:00.0     N/A |                  N/A |
| N/A   59C  N/A     N/A /  N/A |    164MiB /  1023MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+

Any idea what I should try next?

You appear to have a very chaotic system, with stuff scattered all over the place in directories I wouldn’t expect.

I would recommend cleaning any junk off your system and re-installing CUDA using runfile installer method.

what driver version is reported by nvidia-smi? (is it still 331.113?)

now do

nvcc --version

and report what it says.

You appear to be using CUDA 6.5 with a driver version of 331.113. That will not work.

CUDA 6.5 requires 340.29 or newer.

The runfile installer method should harmonize all of this. Along with careful setting of your PATH variable and LD_LIBRARY_PATH as instructed in the linux getting started doc:

[url]http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#abstract[/url]

I started down the path you suggested and found that CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer says “driver support for older generation GPUs with SM1.x has been deprecated.” I’m not sure what SM1.x means, but I think it means Compute Capability 1.x. My Quadro FX 880M is Compute Capability 1.2. So, perhaps I shouldn’t install the latest drivers from the .run files? Perhaps using the repository versions is also a problem, and I need to find a previous driver and cuda version?

Similarly, CUDA GPUs - Compute Capability | NVIDIA Developer no longer lists Quadro FX 880M under Cuda-Enabled Quadro Mobile Products. It is listed on https://developer.nvidia.com/cuda-legacy-gpus though.

If you agree that this is the reason for the problem I’m experiencing, I’ll mark the thread as Solved. I’m also opening a new thread about finding the latest driver that supports a given Compute Capability, since this isn’t obvious to me after some web searching.

nvidia-smi reports “Driver Version: 331.113”

$ /usr/local/cuda-6.5/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12

331.113 won’t work with CUDA 6.5.

If you loaded the CUDA 6.5 toolkit from the runfile installer, it should have installed 340.29 or newer. That driver will work with your GPU. Deprecated means “it’s still supported, but support is going away in the future”. In fact, CUDA 6.5 and these 340.xx driver branches are the last that will support your cc1.2 GPU.