General Doubt Regarding Frameworks Support Matrix of NGC Containers

jeethesh-pai.umesh · August 30, 2023, 12:47pm

Hi,

I have a general doubt regarding using NGC Container on any Computer. I had a requirement for using CUDA 9.2 and Pytorch 0.4.1 with Ubuntu 16.04 on my Geforce RTX 3080 Laptop (based on Ampere arch i guess). According to requirements i chose
Container: nvcr.io/nvidia/pytorch:18.06-py3 (Supports Volta and Pascal architecture)
for my experiment even though it did not support Ampere Hardware.

I found the following observations rather fishy:

After building the Docker image, I found that torch was installed inside a conda environment but with version 0.5.1 instead of 0.4.1 as mentioned in the Frameworks Support Matrix.
Running pytorch on cuda works. I confirmed using torch.cuda.is_available() which return true. Which means Pytorch recognizes that the machine has GPU.
Running a convolution network freezes at nn.Conv2d() line and returns CUDNN_STATUS_EXECUTION_FAILED after 5 or 10 minutes.
Running the same model on CPU runs without any problem even though it is damn slow.

I would like to know if this is the problem of CuDNN mismatch with the hardware. If yes, is there a way to get an image with my requirements for Ampere Architecture.

Thanks a lot for the help,
Jeethesh

trent.j.why · April 1, 2024, 9:36pm

In my limited experience, the documentation for the containers in regard to environment details is flat wrong. The Pytorch container, for example, doesn’t have a conda environment, doesn’t have python 3.5, doesn’t have python 3.8. These are all variously claimed in different documents in and out of the container. In truth, it has a system Python 3.10 with all the installations.

Topic		Replies	Views
NGC pytorch docker container. The NVIDIA Driver was not detected Docker and NVIDIA Docker	0	934	February 23, 2023
CUDA forward compatibility miracle with Nvidia container on Docker CUDA Setup and Installation	1	1400	December 4, 2021
How to get cuda working in a docker container for pytorch applications Jetson AGX Orin docker , pytorch	3	520	May 30, 2024
PyTorch can't find CUDA inside JetPack 5.1 docker container Jetson Xavier NX cuda , pytorch , python	3	1171	March 24, 2023
Unable to access CUDA using torch on IGX Orin Dev Kit IGX Developer Kit cuda , pytorch	16	1094	August 22, 2023
Conda environments: The pytorch and nvidia channels aren't playing nicely together and the nvidia channel is out of date cuDNN cuda , python	3	3557	October 12, 2023
Modulus container no longer functions after updating to latest display + cuda drivers Technical Support (Modulus Only) cuda , driver , rhel	3	1527	November 4, 2022
Upgrading Nvidia DGX packages did not update CUDA version DGX User Forum cuda	4	920	March 3, 2023
PyTorch NGC Container Frameworks pytorch	0	728	March 18, 2020
Slower Speeds with Different NVIDIA Image Container: CUDA cuda , containers	0	606	January 3, 2024

General Doubt Regarding Frameworks Support Matrix of NGC Containers

Related topics