Some context: I have been developing simulations of spiking neural networks in using CUDA/C++ and now I am trying to port them to my universities super computer cluster via docker. I compile and run these simulations on my host machine running windows 10 pro (update 21H2) with a GTX3060ti, CUDA version 11.7 and driver version 516.40.
I have followed this tutorial to completion (despite requiring windows 11, it seems to work with the latest version of windows 10, 21H2). I can run the example docker images (nbody problem) however I run into trouble when I try to compile code using nvcc inside a docker container (11.7.0-devel-ubuntu20.04). I can compile the .cu file with nvcc however when I try to run the executable I get the error:
CUDA driver version is insufficient for CUDA runtime version
nvidia-smi is unavailable in the container, but running nvcc --version gives me the following information:
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
After much googling I am still at a loss, so any help would be appreciated.
I also still don’t understand which numbers in the above refer to the driver version and which refer to the runtime version.
Here is some explanation about all those version numbers :):
The CUDA 11.7 is the version of the CUDA API. To give you an analogy it is like OpenGL 4.5 or DirectX 12. It indicates the CUDA feature sets that you have access to.
The driver version in your case 516.40, is the Display Driver Version. The CUDA Driver is part of the Display Driver and therefore you have a CUDA Driver installed on your system.
Each CUDA Driver is backward compatible and supports up to a certain feature set for instance if a CUDA Driver advertise 11.7 it will support everything from CUDA 1.0 to CUDA 11.7 features
Now for the Runtime:
The CUDA Runtime is not part of the display driver, it comes with the toolchain (CUDA Toolkit) that you download and provides a set of features that are build on the top of the CUDA Driver.
Unlike the driver the Runtime is a redistributable components that gets redistributed with your application.
Each CUDA Runtime requires a driver with a minimum feature set to run and this is checked at startup.
Finally what might be going on your particular case:
Considering you are able to run CUDA Apps your setup and driver install is likely right
Most likely some driver or library in the container you are using to build might be wrong or taking over the one we provide
To help more we would need the following:
** In baremetal the output of the nvidia-smi command within wsl (use nvidia-smi not nvidia-smi.exe)
** The exact docker command line and the container used to do those builds
Brilliant, thank you, that was the issue. I didn’t have the container toolkit installed. For anyone else who runs into this issue:
First (not 100% sure if this is necessary but I did it): Go into docker settings and enable integration with your ubuntu-20.04 distro
Open up ubuntu in the WSL terminal and follow the instructions linked above for enabling docker repository and installing the toolkit.