Upgrading Nvidia DGX packages did not update CUDA version

pedroDGX · February 16, 2023, 10:02am

Hello everyone,

We have a DGX box with A100 cards. We attempted to perform an upgrade on the system packages to move the DGX OS from 5.0.2 to the latest 5.4.2 version.

We followed the steps on the Nvidia documentation center to perform the upgrade via CLI:
dgx os upgrade notes

After rebooting of the system, I can see that the Nvidia drivers were successfully updated from 450.80 to 450.216.04 and the DGX OS was also successfully updated.

However, Nvidia CUDA is still on 11.0 where I was expecting to move up to 11.4 as per the release documentation (unless I misunderstood it).

Could someone please help us update also CUDA on our DGX system? Do I need to branch-off to a newer version of Nvidia drivers (e.g., 470.*) for this to happen?

Thank you very much in advance!

ScottEllis · February 16, 2023, 5:37pm

Hi @pedroDGX ,

Do you not see any other CUDA package versions available (via apt search or apt-cache policy for example)?

You may also consider switching your work to using NGC containers. The NGC CUDA containers are an excellent way to have a repeatable environment, and use the forward and backward compatibility of CUDA regardless of which driver versions are installed on the host.

ScottE

pedroDGX · February 17, 2023, 9:09am

Hello @ScottEllis ,

Thank you very much for your response!

I do see other CUDA versions available when I do apt search (e.g., cuda-libraries-* all the way to version 12, but the one showing with the [installed] flagged is version 11-0).

We do run our ML/DL jobs using docker, and I just learned about the forward compatibility feature!

I was a bit confused before. I thought I needed to have a CUDA version on the host (DGX system) at least as high as that of the DL library I wanted to run on my container (e.g., if I have CUDA 11.0 on the host, I can only run PyTorch built against CUDA 11.0 or built against an earlier version of CUDA on the container).

I just grabbed a container with CUDA 11.6 and cudnn8, installed PyTorch 1.13 (built against CUDA 11.6) on top, and everything works well.

Following this example, I guess the forward compatibility means I can run PyTorch 1.13 (built against CUDA 11.6) on any container with CUDA 11.*, right?

Thank you very much!!

ScottEllis · February 17, 2023, 4:36pm

Exactly @pedroDGX ! The beauty of using containers in this model is that the host only needs to have the GPU driver installed. You can then run almost any version of CUDA inside a Pytorch container - the CUDA compatibility let’s it work with older or newer drivers on the host.

Most DGX users don’t even install CUDA in the host OS - that’s not used if work is done in containers. :-)

ScottE

Topic		Replies	Views
CUDA driver issues DGX-1 ?? Container: CUDA	5	1700	October 12, 2021
Unable to install latest CUDA libraries on new DGX DGX Systems (Data Center) cuda	1	813	October 3, 2022
CUDA forward compatibility miracle with Nvidia container on Docker CUDA Setup and Installation	1	1763	December 4, 2021
Independently update CUDA version in DRIVE OS DRIVE AGX Orin General driveos-cuda	6	787	May 18, 2023
DGX Station V-100 Driver update fail DGX Systems (Data Center)	1	1163	April 16, 2021
Upgrading cuda DRIVE AGX Orin General cuda , driveos-cuda	7	158	October 17, 2025
Update Nvidia Data Driver on Ubuntu 20.04 LTS Linux	5	2518	February 14, 2022
How to use CUDA compatibility package to use a newer driver on an older kernel module CUDA Setup and Installation	8	5675	July 8, 2019
Update CUDA version in DRIVE OS DRIVE AGX Orin General driveos-cuda	2	169	August 6, 2024
Have both cuda 10.2 and 10.0 on Jetpack 4.6 Jetson AGX Xavier cuda	7	1255	February 21, 2022

Upgrading Nvidia DGX packages did not update CUDA version

Related topics