I’m trying to deploy a Singularity image (think: Docker but lighter and no root required) on our HPC cluster which uses CUDA. On the compute nodes CUDA 9.2 with driver 396.37 is installed but I’d like to use CUDA 10.x.
If I simply use the official CUDA docker images (nvidia/cuda-ppc64le:10.1-cudnn7-devel-ubuntu18.04) as a base and try to run a program compiled inside the container I get “CUDA driver version is insufficient for CUDA runtime version”
This is due CUDA 10.1 requiring driver >=418.39 so expected. But https://docs.nvidia.com/deploy/cuda-compatibility/index.html describes that on can update all user mode CUDA stuff (runtime + driver) without updating the kernel mode stuff. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#flexible-upgrade-path does mention this too but does not explain HOW to actually do that.
From the first link I tried
sudo apt-get install cuda-compat-10.0 in the container and added
export LD_LIBRARY_PATH="/usr/local/cuda-10.0/compat:$LD_LIBRARY_PATH" (checked the path for correctness). But this still does not work: Same error running a program and
nvidia-smi reports the old driver version.
If I check my application with
ldd I don’t see any mentioning of
libcuda.so* so the runtime seems to choose the driver differently.
What is required to use the newer driver version without a kernel update?