CUDA forward compatibility miracle with Nvidia container on Docker

As the title suggests, I’m able to run a higher version of CUDA in a container than the host drivers allow and test computations seem to come out fine. I want to understand if there is some undocumented support for forward compatibility with containers or if I’m just lucky it hasn’t broken yet. The details:

Host system
OS: Ubuntu 18.04.3 LTS
NVIDIA Driver: 440.33.01 (acquired via nvidia-smi)
CUDA max supported version: 10.2 (acquired via nvidia-smi)
CUDA actual version installed: 9.1.85 (acquired via nvcc --version)

Container
Base Image: nvidia/cuda:11.3-devel or nvcr.io/nvidia/pytorch:21.06-py3 (I’ve tried this with both and the results were the same)
NVIDIA Driver: 440.33.01 (acquired via nvidia-smi)
CUDA max supported version: 11.3 (acquired via nvidia-smi)
CUDA actual version installed: 11.3.109 (acquired via nvcc --version)
All gpus from the host are passed to the container (5 total)

To test whether I would still be able to use CUDA in pytorch, I tried all the usual pytorch cuda commands (torch.cuda.is_available(), torch.cuda.device(0), torch.cuda.get_device_name(0), etc and I ran the simple example at the top of this pytorch page. All of these tests were successful.

Could I really be running CUDA 11.3 in the container when the host driver doesn’t support it? Could the 11.3 be misreported or is it just luck that I haven’t encountered an error yet?

It’s not undocumented. The general principles are documented here. You can find out whether your container loads the necessary libraries by studying the dockerfile itself, or the documentation for the container.

This article may also be of interest.

1 Like