Hi,
I have a general doubt regarding using NGC Container on any Computer. I had a requirement for using CUDA 9.2 and Pytorch 0.4.1 with Ubuntu 16.04 on my Geforce RTX 3080 Laptop (based on Ampere arch i guess). According to requirements i chose
Container: nvcr.io/nvidia/pytorch:18.06-py3 (Supports Volta and Pascal architecture)
for my experiment even though it did not support Ampere Hardware.
I found the following observations rather fishy:
-
After building the Docker image, I found that torch was installed inside a conda environment but with version 0.5.1 instead of 0.4.1 as mentioned in the Frameworks Support Matrix.
-
Running pytorch on cuda works. I confirmed using torch.cuda.is_available() which return true. Which means Pytorch recognizes that the machine has GPU.
-
Running a convolution network freezes at nn.Conv2d() line and returns CUDNN_STATUS_EXECUTION_FAILED after 5 or 10 minutes.
-
Running the same model on CPU runs without any problem even though it is damn slow.
I would like to know if this is the problem of CuDNN mismatch with the hardware. If yes, is there a way to get an image with my requirements for Ampere Architecture.
Thanks a lot for the help,
Jeethesh