Getting cudaRuntimeGetVersion() failed with error #35 for CUDA Version 7.5.18 with 361.42 driver

MrWorshipMe · September 6, 2016, 1:28pm

Hello,

I’m trying to run DIGITS 4.0 docker image on an EC2 machine using nvidia-docker.

My EC2 machine has the 361.42 nvidia driver up and running, and nvidia-docker connects to it fine. Using the nvidia/cuda docker I was able to verify with nvidia-smi that a GPU is detected, and the driver version is indeed 361.42:

+------------------------------------------------------+                       
| NVIDIA-SMI 361.42     Driver Version: 361.42         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   36C    P8    17W / 125W |     11MiB /  4095MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

But when running nvidia/digits, I get in the log the following error:
cudaRuntimeGetVersion() failed with error #35

Which seems to mean my driver version is too old for the CUDA runtime. (If I understand correctly)
But 361.42 is a pretty recent release, isn’t it?

DIGITS 4 uses CUDA 7.5.18, according to its /usr/local/cuda/version.txt

Any suggestions?

Robert_Crovella · September 6, 2016, 6:07pm

what OS are you using on that instance?

how did you install the 361.42 driver?

MrWorshipMe · September 6, 2016, 8:07pm

It’s Ubuntu 15.10 (GNU/Linux 4.2.0-42-generic x86_64), this is what I did from the beginning:

$ sudo apt-get update
$ sudo apt-get install --no-install-recommends -y gcc make libc-dev
$ wget -P /tmp http://us.download.nvidia.com/XFree86/Linux-x86_64/361.42/NVIDIA-Linux-x86_64-361.42.run
$ sudo sh /tmp/NVIDIA-Linux-x86_64-361.42.run --silent
$ wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb
$ sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
$ sudo apt-get install dkms build-essential linux-headers-generic
$ sudo nano /etc/modprobe.d/blacklist-nouveau.conf

adding the following lines:

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

save and quit

$echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
$sudo update-initramfs -u

I may have had to re-run the nvidia installer again at this stage. (exactly the same 2 lines as before)

And finally
$sudo usermod -aG docker ubuntu
$sudo service nvidia-docker start

made sure both docker and nvidia-docker-plugin services are up:

$service nvidia-docker status
$service docker status

And as mentioned above, the nvidia/cuda docker is able to run nvidia-smi and show the GPU and driver versions show as expected…

Also might be related:

trying to nvidia-docker build a dockerfile based on nvidia/cuda:7.0-cudnn4-devel-ubuntu14.04 which clones the master branch of caffe and compiles it with cudnn enabled fails on the beginning of testing with the following error:

Cuda number of devices: 0
Setting to use device 0
Current device id: 0
Current device name: 
Note: Randomizing tests' orders with a seed of 21847 .
[==========] Running 2081 tests from 277 test cases.
[----------] Global test environment set-up.
[----------] 50 tests from NeuronLayerTest/3, where TypeParam = caffe::GPUDevice<double>
[ RUN      ] NeuronLayerTest/3.TestSigmoidGradient
E0905 10:18:15.161348   263 common.cpp:113] Cannot create Cublas handle. Cublas won't be available.
E0905 10:18:15.162796   263 common.cpp:120] Cannot create Curand generator. Curand won't be available.
F0905 10:18:15.162914   263 syncedmem.hpp:18] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version

But running

nvidia-docker run -d -p 8080:8080 -v /home/ubuntu/data:/data beniz/deepdetect_gpu

does seem to work… It uses nvidia/cuda:7.5-cudnn4-devel as base…

Robert_Crovella · September 6, 2016, 9:09pm

what is the output of:

sudo dmesg |grep NVRM

in the base OS (i.e. not in a docker container)

also, not sure, but installing docker before completing the driver install steps (e.g. blacklist of nouveau) is something that caught my eye.

MrWorshipMe · September 6, 2016, 9:22pm

[    4.221525] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  361.42  Tue Mar 22 18:10:58 PDT 2016

installing docker before completing the driver install steps did cause a problem with getting nvidia-docker service to start, which is why I had to start it after the driver installation.

You might have missed it, since I just recently edited my post above with new information:

nvidia-docker run -d -p 8080:8080 -v /home/ubuntu/data:/data beniz/deepdetect_gpu
nvidia-docker exec -ti 3b091aba4bd7 bash -c “export PATH=$PATH:/opt/deepdetect/build/caffe_dd/src/caffe_dd/.build_release/tools && cd /data && caffe train -solver SOLVER.prototxt -weights my-start.caffemodel”

with SOLVER having solver_mode: GPU in it.

does seem to recognize the GPU and execute on it:

INFO - 21:01:01 - Using GPUs 0
INFO - 21:01:01 - GPU 0: GRID K520

This docker uses nvidia/cuda:7.5-cudnn4-devel as base…

I’m at a loss as to what’s wrong with the DIGITS docker, or the one I wrote… I’d really like to make them work, as the deepdetect docker lacks python binding or a decent interface, making me resort to using caffe in cli…

Topic		Replies	Views
Running Cuda on Docker CUDA Setup and Installation	7	17367	May 23, 2016
Failed to install CUDA 7.5 in ubuntu 14.04 LTS Linux	10	22368	November 3, 2015
Problem with CUDA driver and runtime versions CUDA Setup and Installation	1	1383	January 12, 2016
Nvidia-container-cli: detection error: nvml error: function not found: unknown CUDA Programming and Performance cuda , ubuntu , docker	5	8066	April 24, 2021
Nvidia Docker: CUDA driver version is insufficient for CUDA runtime version CUDA Setup and Installation docker	0	1479	October 16, 2021
Failed Cuda Driver and Runtime version may be mismatched Cuda installation fails on Ubuntu 10.4 x86_ CUDA Programming and Performance	13	5107	November 17, 2010
Cuda 11.4.2 docker image driver version mismatch CUDA Setup and Installation	2	4312	January 7, 2022
Cuda cannot find my graphic card? CUDA Setup and Installation	5	2421	April 9, 2019
Can't install driver 361.62 on CentOS 7.2 CUDA Programming and Performance	4	3008	May 28, 2016
Cuda 4.0 RC2 and Ubuntu 11.04b2 CUDA Programming and Performance	10	5307	September 30, 2011

Getting cudaRuntimeGetVersion() failed with error #35 for CUDA Version 7.5.18 with 361.42 driver

Related topics