Some context: I have been developing simulations of spiking neural networks in using CUDA/C++ and now I am trying to port them to my universities super computer cluster via docker. I compile and run these simulations on my host machine running windows 10 pro (update 21H2) with a GTX3060ti, CUDA version 11.7 and driver version 516.40.
I have followed this tutorial to completion (despite requiring windows 11, it seems to work with the latest version of windows 10, 21H2). I can run the example docker images (nbody problem) however I run into trouble when I try to compile code using nvcc inside a docker container (11.7.0-devel-ubuntu20.04). I can compile the .cu file with nvcc however when I try to run the executable I get the error:
CUDA driver version is insufficient for CUDA runtime version
nvidia-smi is unavailable in the container, but running nvcc --version gives me the following information:
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
After much googling I am still at a loss, so any help would be appreciated.
I also still don’t understand which numbers in the above refer to the driver version and which refer to the runtime version.
Hello,
Here is some explanation about all those version numbers :):
- The CUDA 11.7 is the version of the CUDA API. To give you an analogy it is like OpenGL 4.5 or DirectX 12. It indicates the CUDA feature sets that you have access to.
- The driver version in your case 516.40, is the Display Driver Version. The CUDA Driver is part of the Display Driver and therefore you have a CUDA Driver installed on your system.
- Each CUDA Driver is backward compatible and supports up to a certain feature set for instance if a CUDA Driver advertise 11.7 it will support everything from CUDA 1.0 to CUDA 11.7 features
Now for the Runtime:
- The CUDA Runtime is not part of the display driver, it comes with the toolchain (CUDA Toolkit) that you download and provides a set of features that are build on the top of the CUDA Driver.
- Unlike the driver the Runtime is a redistributable components that gets redistributed with your application.
- Each CUDA Runtime requires a driver with a minimum feature set to run and this is checked at startup.
Finally what might be going on your particular case:
- Considering you are able to run CUDA Apps your setup and driver install is likely right
- Most likely some driver or library in the container you are using to build might be wrong or taking over the one we provide
- To help more we would need the following:
** In baremetal the output of the nvidia-smi command within wsl (use nvidia-smi not nvidia-smi.exe)
** The exact docker command line and the container used to do those builds
Thanks !
1 Like
Hi,
Thank you for taking the time to provide this detailed reply.
Running nvidia-smi within wsl gives me:
Mon Aug 1 18:54:47 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07 Driver Version: 516.40 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 0% 52C P8 26W / 200W | 1942MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
To set up the container, and run the code:
docker pull nvidia/cuda:11.7.0-devel-ubuntu20.04
docker run --name cuda_test -it nvidia/cuda:11.7.0-devel-ubuntu20.04
docker cp .\kernel.cu cuda_test:/usr/kernel.cu
nvcc kernel.cu -o ker
./ker
Using Docker version 20.10.17, build 100c701
Container version: nvidia/cuda:11.7.0-devel-ubuntu20.04
Also it might not matter for this container if you are trying to run kernel in that container make sure to have nvida-container-toolkit installed and try running with --gpus all:
https://docs.nvidia.com/ai-enterprise/deployment-guide/dg-docker.html#enabling-the-docker-repository-and-installing-the-nvidia-container-toolkit
1 Like
Thank you, I had come across this guide before but I was confused as it says “for your Linux distribution.”.
Do I run those commands within the docker container or within WSL?
Brilliant, thank you, that was the issue. I didn’t have the container toolkit installed. For anyone else who runs into this issue:
First (not 100% sure if this is necessary but I did it): Go into docker settings and enable integration with your ubuntu-20.04 distro
Open up ubuntu in the WSL terminal and follow the instructions linked above for enabling docker repository and installing the toolkit.