Hi! today I installed CUDA and cudnn for 2 PCs, one works well, while another one follows the same steps failed to pass ./mnistCUDNN test. Actually, it passed the first part - single precision, while the second part - half precision (math in single precision) failed and returned nan nan nan in resulting weights from Softmax. Anyone can help? very appreciated.
Some codes:
Testing single precision
…
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
…
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.148320 time requiring 2000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.150016 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.268768 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.793600 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.644000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 3.518880 time requiring 4656640 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
nan nan nan nan nan nan nan nan nan nan
Loading image data/three_28x28.pgm
…
Resulting weights from Softmax:
nan nan nan nan nan nan nan nan nan nan
Loading image data/five_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
nan nan nan nan nan nan nan nan nan nan
Result of classification: 0 0 0
Test failed!
ERROR: Prediction mismatch in mnistCUDNN.cpp:978
Aborting…
Hi! thanks for answering my question, while I’ve installed cuda and cudnn for 3 times, I believe my cuda and drivers are installed properly, since I type nvcc -V it could return cuda version and I could see my GPU model correctly in Setting - Details.
The first time I installed cudnn wrongly, it cannot even pass the single precision test, and this time it could pass half of the test. so I think there might be other problems.
I have observed a similar issue on GTX 1050. Few tests in OpenCV fail with the CUDNN_STATUS_NOT_SUPPORTED error. These tests used to pass in cuDNN 7. It’s far worse in this case since cuDNN doesn’t provide even a single algorithm that could work.
So, you mean that the problem is related to GPU model and cudnn version? I installed exactly the same environment simultaneously on my 2 PCs, nvidia-driver-440 + CUDA10.2 + anaconda3 + cudnn8.0.2, the only difference is the GPU of the PC with problem is K700 (quite an old GPU) and my laptop’s GPU is GTX970M. could that be the problem? I mean that old GPU could not support such a new version of cudnn?
Hi @a552088920,
Could you provide more information on the two machines you are using?
Are the GPUs are different?
Also resuest you to provide cudnn API logging (follow the cudnn API logs , so that we can help you better.
Thanks for your reply! while I may find what the problem it is. When I tested a jupyter project, it shows that my GPU(K600) is too old thus Pytorch cannot start, i think it might be the same reason to fail the Cudnn test. Whatever, thx again.