cuDNN fails with CUDNN_STATUS_INTERNAL_ERROR on MNIST sample execution

philip.meier · November 1, 2017, 11:36am

My System:

OS: Ubuntu 16.04
GPU: GTX 1080
CUDA: 8.0.61
cuDNN: 6.0.21

I’ve installed CUDA / cuDNN in the following installation routine:

#downloaded from <a target='_blank' rel='noopener noreferrer' href='https://developer.nvidia.com/cuda-80-ga2-download-archive'>https://developer.nvidia.com/cuda-80-ga2-download-archive</a>
sudo dpkg -i cuda.deb
sudo apt update
sudo apt install cuda
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

#downloaded from <a target='_blank' rel='noopener noreferrer' href='https://developer.nvidia.com/cuda-80-ga2-download-archive'>https://developer.nvidia.com/cuda-80-ga2-download-archive</a>
sudo dpkg -i cuda-patch.deb
sudo apt update
sudo apt upgrade

#downloaded from <a target='_blank' rel='noopener noreferrer' href='https://developer.nvidia.com/rdp/cudnn-download'>https://developer.nvidia.com/rdp/cudnn-download</a>
sudo dpkg -i cudnn.deb
sudo dpkg -i cudnn-dev.deb
sudo dpkg -i cudnn-doc.deb

To test CUDA I use the following routine:

cd /usr/local/cuda/samples
sudo make clean && sudo make -j$(nproc) -Wno-deprecated-gpu-targets
cd bin/x86_64/linux/release
./deviceQuery
./bandwithTest

Both test result in PASS.

To test cuDNN I use the following routine from http://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#verify:

cd /usr/src/cudnn_samples_v6/mnistCUDNN
sudo make clean && sudo make -j$(nproc) -Wno-deprecated-gpu-targets
./mnistCUDNN

This fails with the following output:

cudnnGetVersion() : 6021 , CUDNN_VERSION from cudnn.h : 6021 (6.0.21)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 20  Capabilities 6.1, SmClock 1797.0 Mhz, MemSize (Mb) 8107, MemClock 5005.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
CUDNN failure
Error: CUDNN_STATUS_INTERNAL_ERROR
mnistCUDNN.cpp:394
Aborting...

Can someone tell me if I did something wrong or how to fix this?

Robert_Crovella · November 1, 2017, 6:04pm

what happens if you run the mnistCUDNN test as root?
Do you still get the same error?

philip.meier · November 1, 2017, 7:42pm

With root privileges the cuDNN test routine passes.

Robert_Crovella · November 2, 2017, 12:12am

You’re not supposed to have to do this. That was just a diagnostic test.

If you’re still having trouble running it as an ordinary user, there are maybe a few things to check:

[url]CUDNN_STATUS_INTERNAL_ERROR when using cudnn7.0 with CUDA 8.0 - CUDA Setup and Installation - NVIDIA Developer Forums
[url]cuda - Tensorflow only works under root after drivers update - Stack Overflow

philip.meier · November 2, 2017, 8:04am

You are on the right track. I only wanted to check my cuDNN installation, because I was running in the same error mentioned in the StackOverflow thread in the first place.

Could you elaborate on the accepted answer?

My cuDNN test routine still fails, if I simply add

sudo usermod -a -G nvidia-persistenced $USER

to my installation routine.

philip.meier · November 7, 2017, 9:52am

Any update on this? Unfortunately I can’t comment on the StackOverflow thread due to unmet reputation requirements.

chienchih.lin · May 10, 2018, 3:41am

I have the same issues…

OS: Ubuntu 16.04
GPU: GTX 1080 Ti
CUDA: Cuda compilation tools, release 9.0, V9.0.176
cuDNN: 7.1.3
driver version: NVIDIA-SMI 384.111 Driver Version: 384.111

I have passed CUDA test
./deviceQuery
./bandwithTest

I need to add sudo to pass mnistCUDNN test

No use for the following command:
sudo usermod -a -G nvidia-persistenced $USER

I could not use CuDNN to run Tensorflow now.

Please advice!

philip.meier · May 11, 2018, 6:21am

This bug persists for me to this day. Thus, I would also appreciate an answer after all this time.

My ‘work around’ is just to launch the IDE or the script with root privileges. This opens up a lot more problems, but at least I can run training on the GPU.

chienchih.lin · May 11, 2018, 8:37pm

My issue was resolved by using the following steps: (Don’t know which one fixed the issue through)

log out and cool-reboot/shutdown
sudo rm -rf .nv/
sudo usermod -a -G nvidia-persistenced $USER
log out and cool-reboot/shutdown again

philip.meier · May 14, 2018, 6:15am

This indeed fixed the bug for me. How did you find the solution? Would have been nice to know 6 months ago when I encountered it.

chienchih.lin · May 14, 2018, 5:28pm

from the link NVIDIA mentioned:

You’re not supposed to have to do this. That was just a diagnostic test.

If you’re still having trouble running it as an ordinary user, there are maybe a few things to check:

https://devtalk.nvidia.com/default/topic/1024761/cuda-setup-and-installation/cudnn_status_internal_error-when-using-cudnn7-0-with-cuda-8-0/
cuda - Tensorflow only works under root after drivers update - Stack Overflow
===
Also, from web, find out “log out” indeed did some help better than hot-reboot or cool-reboot.

philip.meier · May 14, 2018, 7:37pm

I don’t think the reboot alone will do much good. I had this problem for about half a year and shut the PC down countless times. I think the clearing of the cache is the critical part. But that is on me. I completely missed it in the other thread. Anyway, thanks for the info.

Topic		Replies	Views
CUDNN_STATUS_INTERNAL_ERROR when using cudnn7.0 with CUDA 8.0 CUDA Setup and Installation	17	6960	April 4, 2021
cuDNN crashes ever since an error during training cuDNN	7	6241	October 12, 2021
Error: CUDNN_STATUS_NOT_INITIALIZED (Titan Xp and Ubuntu 17.10) CUDA Programming and Performance	13	3890	March 30, 2018
Does the latest GTX 1660 model support cuda? CUDA Setup and Installation	16	66623	October 1, 2023
Getting error, RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED while running a basic RNN model TensorRT pytorch	3	19397	April 17, 2023
cuDNN Test did not pass cuDNN	24	16818	April 29, 2019
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize cuDNN	29	51610	October 12, 2021
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR cuDNN	3	8111	November 7, 2019
cuDNN Error cuDNN	1	933	April 25, 2019
Failed cuDNN test (./mnistCUDNN) cuDNN	24	21092	June 15, 2023

cuDNN fails with CUDNN_STATUS_INTERNAL_ERROR on MNIST sample execution

from the link NVIDIA mentioned:

Related topics