Ubuntu 16.04 problem with cuda 9.1 + 390.30 driver!

Hi everyone!

So as you can see from my terminal’s output, I have a 390.30 driver, a 9.1 cuda toolkit,
and yet my pytorch package is complaining that my driver is too old.
AFAIK, cudaDriverGetVersion returns 5000, when my actual driver vertsion is 390.30.

The only version of pytorch I can try right now is the one compatible with 9.1- I don’t see cuda 8 or 9.0 still available at Nvidia’s website :P

What should I do in order to fix this?

Here’s my termial output in case this will help you:

P.S. thanks in advnace! :)

yoni@yoni-Lenovo-Z710:~ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85 yoni@yoni-Lenovo-Z710:~ nvidia-smi
Mon Feb 19 19:03:17 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 840M Off | 00000000:01:00.0 Off | N/A |
| N/A 42C P8 N/A / N/A | 294MiB / 2004MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1607 G /usr/lib/xorg/Xorg 134MiB |
| 0 2820 G compiz 54MiB |
| 0 3347 G …-token=5C0640A411AB5E45B1719A458ACCAC5D 101MiB |
±----------------------------------------------------------------------------+
yoni@yoni-Lenovo-Z710:~$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
torch.version
‘0.3.1’

torch.randperm(5).cuda()
Traceback (most recent call last):
File “”, line 1, in
File “/usr/local/lib/python3.5/dist-packages/torch/_utils.py”, line 69, in cuda
return new_type(self.size()).copy
(self, async)
File “/usr/local/lib/python3.5/dist-packages/torch/cuda/init.py”, line 384, in _lazy_new
_lazy_init()
File “/usr/local/lib/python3.5/dist-packages/torch/cuda/init.py”, line 141, in _lazy_init
_check_driver()
File “/usr/local/lib/python3.5/dist-packages/torch/cuda/init.py”, line 71, in _check_driver
of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
AssertionError:
The NVIDIA driver on your system is too old (found version 5000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: http://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

have you validated the CUDA install?

Sorry, not too sure about what you mean by validate,So I will love it if you can explain :)
Do you mean running the deviceQuery program?

yes, the validation instructions are in the linux install guide
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

So steps 1 and 2 in the verification seem ok but deviceQuery fails for some reason (the output makes no sense to me because 390 should be the latest driver version but you will probably know what’s going on):

yoni@yoni-Lenovo-Z710:~ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.30 Wed Jan 31 22:08:49 PST 2018 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5) yoni@yoni-Lenovo-Z710:~ nvcc -V
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

yoni@yoni-Lenovo-Z710:~/NVIDIA_CUDA-9.1_Samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

So you have a broken driver install. pytorch is essentially just telling you the same thing

nvidia-smi is insufficient for full verification of a proper GPU driver install for CUDA.

Get your CUDA installers from http://www.nvidia.com/getcuda

regarding your comment “I don’t see cuda 8 or 9.0 still available at Nvidia’s website”, study that page carefully. You will find a link to the CUDA legacy toolkits (archive) page.

If you need to install a specific driver, get your specific driver installer from: http://www.nvidia.com/drivers Using drivers from your linux distribution, or e.g. PPA, may not get you all the components you need to run CUDA. For example, you can get the NVIDIA-approved 390.30 driver for Ubuntu 16.04 here:

http://www.nvidia.com/download/driverResults.aspx/131160/en-us

Follow the instructions carefully in the linux install guide:

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

Actually,Just so you know, the first thing I’ve tried was a driver installation from the official website.

I’ve chosen my GPU model, OS and so on, the file name is NVIDIA-Linux-x86_64-390.25.run.
Alt+control+f1, stopped x service, ran the installation.
It broke my OS completely. Black screen,literally.
Had to use grub and some guides I’ve found to get my system back.

The second thing I’ve tried was to use the driver which is included in the cuda runfile.
That broke my OS such that I couldn’t get pass the login screen.
Had to revert back again (every iteration like that takes time).

Only after that experience I’ve tried getting my driver from apt-get.

Got the cuda installer from the official website.

So ok Now I will install the driver from the link you have provided and report the results.

It seems like the same version number as I have, but notice my GPU model. GeForce840M.

If that doesn’t work, do you recommend me trying a legacy installation of cuda?

Anyways, I will edit this message either way.

Update 1: Tried to install the driver and installed cuda toolbox without uninstalling first.
Didnt work. Now I’ll try to uninstall CUDA and then install it and see what happens…

Update 2: Uninstalled using the script at the bin folder. I’ve also removed every single cuda mention from apt-get list this time as well,and followed every single step in the huge installation guide. Same result :(

Update 3: Tried to install another driver from the official website (384.111) compatible with my OS and GPU model.
It broke the OS such that I couldn’t get pass the login screen.
In case I’m not installing the drivers using the right procedure- I’m doing the same steps as suggested here https://askubuntu.com/questions/66328/how-do-i-install-the-latest-nvidia-drivers-from-the-run-file

Update 4: Got the 387 driver using apt-get,since that’s the one the cuda installer wanted to install (but failed). Everything is working for me now.

Summary: It seems to me that there might be some issues with the drivers provided by the website and/or back compatibility issues when it comes to the drivers/cuda.
Either way, It will be awesome if you can make the installation for both the drivers and cuda easier :)

Somehow CUDA corrupts the kernel starting up.
The fix below worked for me

https://www.eriksmistad.no/fix-nvidia-driver-for-linux-kernel-4-13-on-ubuntu/

Same problem for me also :

Cuda doesn’t support GCC 5.4.0.
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

I think need to downgrade from GCC 5.4.0 to GCC 5.3.1 in ubuntu 16.04.
I haven’t solved issue yet :(

$ ./mnistCUDNN

cudnnGetVersion() : 7005 , CUDNN_VERSION from cudnn.h : (7005) (7.0.5)
Cuda failurer version : GCC 5.4.0
Error : unknown error
error_util.h:93
Aborting…

Ubuntu 16.04.9 : Ubuntu 5.4.0-6ubuntu1 : GCC 5.4.0
CUDA Toolkit 9.1
Python3.6.4