./mnistCUDNN half-precision Test Failed

Hi! today I installed CUDA and cudnn for 2 PCs, one works well, while another one follows the same steps failed to pass ./mnistCUDNN test. Actually, it passed the first part - single precision, while the second part - half precision (math in single precision) failed and returned nan nan nan in resulting weights from Softmax. Anyone can help? very appreciated.

Some codes:

Testing single precision

Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!


Testing half precision (math in single precision)

Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.148320 time requiring 2000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.150016 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.268768 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.793600 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.644000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 3.518880 time requiring 4656640 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
nan nan nan nan nan nan nan nan nan nan
Loading image data/three_28x28.pgm

Resulting weights from Softmax:
nan nan nan nan nan nan nan nan nan nan
Loading image data/five_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
nan nan nan nan nan nan nan nan nan nan

Result of classification: 0 0 0

Test failed!
ERROR: Prediction mismatch in mnistCUDNN.cpp:978
Aborting…

Hi @a552088920,
The issue might be with driver installation in 2nd PC.
Request you to check the below link.

Thanks!

Hi! thanks for answering my question, while I’ve installed cuda and cudnn for 3 times, I believe my cuda and drivers are installed properly, since I type nvcc -V it could return cuda version and I could see my GPU model correctly in Setting - Details.

The first time I installed cudnn wrongly, it cannot even pass the single precision test, and this time it could pass half of the test. so I think there might be other problems.

I have observed a similar issue on GTX 1050. Few tests in OpenCV fail with the CUDNN_STATUS_NOT_SUPPORTED error. These tests used to pass in cuDNN 7. It’s far worse in this case since cuDNN doesn’t provide even a single algorithm that could work.

More information here: https://github.com/opencv/opencv/issues/17496#issuecomment-650491470

Note: these tests are still failing in cuDNN 8.0.2

So, you mean that the problem is related to GPU model and cudnn version? I installed exactly the same environment simultaneously on my 2 PCs, nvidia-driver-440 + CUDA10.2 + anaconda3 + cudnn8.0.2, the only difference is the GPU of the PC with problem is K700 (quite an old GPU) and my laptop’s GPU is GTX970M. could that be the problem? I mean that old GPU could not support such a new version of cudnn?

Actually, my reply is irrelevant to this post. It’s a different problem.

Hi @a552088920,
Could you provide more information on the two machines you are using?
Are the GPUs are different?
Also resuest you to provide cudnn API logging (follow the cudnn API logs , so that we can help you better.
https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#api-logging

Thanks!

Thanks for your reply! while I may find what the problem it is. When I tested a jupyter project, it shows that my GPU(K600) is too old thus Pytorch cannot start, i think it might be the same reason to fail the Cudnn test. Whatever, thx again.