mnistCUDNN Test Failed

Hello,
I am trying to run CUDA 10 toolkit + CUDNN 7.5.0 with RTX 3060 GPU. I got totally stuck at CUDNN verifying.

When I run $ ./mnistCUDNN i get:

cudnnGetVersion() : 7500 , CUDNN_VERSION from cudnn.h : 7500 (7.5.0)
Host compiler version : GCC 6.5.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 28 Capabilities 8.6, SmClock 1852.0 Mhz, MemSize (Mb) 12045, MemClock 7501.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm …
Fastest algorithm is Algo 0
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.001024 time requiring 100 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.009536 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.056032 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.214880 time requiring 57600 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
0.0997550 0.0892810 0.1054468 0.1071846 0.0902156 0.1043498 0.0953814 0.0938049 0.1155469 0.0990339

Result of classification: 1 3 8

Test failed!
Prediction mismatch
mnistCUDNN.cpp:876
Aborting…


Results of $ nvidia-smi :

Thu Aug 26 23:58:30 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:07:00.0 On | N/A |
| 0% 40C P8 18W / 170W | 371MiB / 12045MiB | 2% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1176 G /usr/lib/xorg/Xorg 26MiB |
| 0 N/A N/A 1248 G /usr/bin/gnome-shell 88MiB |
| 0 N/A N/A 1514 G /usr/lib/xorg/Xorg 146MiB |
| 0 N/A N/A 1647 G /usr/bin/gnome-shell 27MiB |
| 0 N/A N/A 2042 G …AAAAAAAAA= --shared-files 77MiB |
±----------------------------------------------------------------------------+

And $ nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Thank you in advance for any idea that might help. I reinstalled everything several times and followed official documentation. Maybe I have missed some compatibility issues, but hopefully not.

Hi @HolJak ,
Can you plesae check if you have installed the driver properly. Use the method in the linux install guide. Perform all steps including verification, before attempting to do anything else (like install cudnn)

Thanks!

Thank you @AakankshaS for your answer. During installation of CUDA i verified with ./deviceQuery test with result = PASS. This morning tried again with just to be sure. Results here:

./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “NVIDIA GeForce RTX 3060”
CUDA Driver Version / Runtime Version 11.4 / 10.0
CUDA Capability Major/Minor version number: 8.6
Total amount of global memory: 12046 MBytes (12630884352 bytes)
MapSMtoCores for SM 8.6 is undefined. Default to use 64 Cores/SM
MapSMtoCores for SM 8.6 is undefined. Default to use 64 Cores/SM
(28) Multiprocessors, ( 64) CUDA Cores/MP: 1792 CUDA Cores
GPU Max Clock rate: 1852 MHz (1.85 GHz)
Memory Clock rate: 7501 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 2359296 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 7 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS