As the title suggests, I’m able to run a higher version of CUDA in a container than the host drivers allow and test computations seem to come out fine. I want to understand if there is some undocumented support for forward compatibility with containers or if I’m just lucky it hasn’t broken yet. The details:
Host system
OS: Ubuntu 18.04.3 LTS
NVIDIA Driver: 440.33.01 (acquired via nvidia-smi
)
CUDA max supported version: 10.2 (acquired via nvidia-smi
)
CUDA actual version installed: 9.1.85 (acquired via nvcc --version)
Container
Base Image: nvidia/cuda:11.3-devel or nvcr.io/nvidia/pytorch:21.06-py3 (I’ve tried this with both and the results were the same)
NVIDIA Driver: 440.33.01 (acquired via nvidia-smi
)
CUDA max supported version: 11.3 (acquired via nvidia-smi
)
CUDA actual version installed: 11.3.109 (acquired via nvcc --version)
All gpus from the host are passed to the container (5 total)
To test whether I would still be able to use CUDA in pytorch, I tried all the usual pytorch cuda commands (torch.cuda.is_available()
, torch.cuda.device(0)
, torch.cuda.get_device_name(0)
, etc and I ran the simple example at the top of this pytorch page. All of these tests were successful.
Could I really be running CUDA 11.3 in the container when the host driver doesn’t support it? Could the 11.3 be misreported or is it just luck that I haven’t encountered an error yet?