ERROR: cudnn failure (CUDNN_STATUS_EXECUTION_FAILED) in mnistCUDNN.cpp:625

Hi, I am quite new to CUDA/CUDNN, I apologize in advance if I am posting this on the wrong place.
I am trying to verify my CUDNN installation following the instructions on:

https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html

When I try to verify the installation in Linux by running the mnistCUDNN sample, the process starts but then aborts. I seem to be getting a very specific error on line 625, for which I have not found any documented solution anywhere. I am maybe missing something very obvious here, as I said I am new to CUDA/CUDNN. I am running this on Ubuntu 20.04. Any help is hugely appreciated. The script output is below for reference:

Executing: mnistCUDNN
cudnnGetVersion() : 8005 , CUDNN_VERSION from cudnn.h : 8005 (8.0.5)
Host compiler version : GCC 9.3.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 2 Capabilities 3.0, SmClock 745.0 Mhz, MemSize (Mb) 1999, MemClock 900.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_EXECUTION_FAILED for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_EXECUTION_FAILED for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_EXECUTION_FAILED for Algo 2: -1.000000 time requiring 57600 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_INTERNAL_ERROR for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_INTERNAL_ERROR for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_EXECUTION_FAILED for Algo 7: -1.000000 time requiring 2057744 memory
ERROR: cudnn failure (CUDNN_STATUS_EXECUTION_FAILED) in mnistCUDNN.cpp:625
Aborting…

Additional information from nvidia-smi:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro K2000M On | 00000000:01:00.0 Off | N/A |
| N/A 48C P0 N/A / N/A | 194MiB / 1999MiB | 2% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

I have a similar problem but for me, this appears:

./mnistCUDNN
cudnnGetVersion() : 7605 , CUDNN_VERSION from cudnn.h : 7605 (7.6.5)
Host compiler version : GCC 7.5.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 34 Capabilities 7.5, SmClock 1665.0 Mhz, MemSize (Mb) 7979, MemClock 7001.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
CUDNN failure
Error: CUDNN_STATUS_INTERNAL_ERROR
mnistCUDNN.cpp:394
Aborting…

Hope someone could help us

Hi @luiferi ,
Can you please check if this helps

Thanks!

Hi @AakankshaS,

Thank you for the suggestion but that solutions seems to apply to pytorch users.
I do not have pytorch installed in my system anyway.

I have tried to install pytorch just in case it would potentially add any missing dependency. But the error persists, unfortunately and I am getting exactly the same output.

What else could I try?

Hi @luiferi ,
device 0 : sms 2 Capabilities 3.0, is not supported, please refer to the below link
https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-805/support-matrix/index.html
Thanks!