I’m trying to deploy a Singularity image (think: Docker but lighter and no root required) on our HPC cluster which uses CUDA. On the compute nodes CUDA 9.2 with driver 396.37 is installed but I’d like to use CUDA 10.x.
If I simply use the official CUDA docker images (nvidia/cuda-ppc64le:10.1-cudnn7-devel-ubuntu18.04) as a base and try to run a program compiled inside the container I get “CUDA driver version is insufficient for CUDA runtime version”
This is due CUDA 10.1 requiring driver >=418.39 so expected. But CUDA Compatibility :: NVIDIA Data Center GPU Driver Documentation describes that on can update all user mode CUDA stuff (runtime + driver) without updating the kernel mode stuff. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#flexible-upgrade-path does mention this too but does not explain HOW to actually do that.
From the first link I tried sudo apt-get install cuda-compat-10.0
in the container and added export LD_LIBRARY_PATH="/usr/local/cuda-10.0/compat:$LD_LIBRARY_PATH"
(checked the path for correctness). But this still does not work: Same error running a program and nvidia-smi
reports the old driver version.
If I check my application with ldd
I don’t see any mentioning of libcuda.so*
so the runtime seems to choose the driver differently.
What is required to use the newer driver version without a kernel update?