I am installing GTX 1080 GPU with Ubuntu 20.04 and to setup GPU acceleration for my deep learning application. Let me first give the error messages I am seeing, the installation steps I followed, and my finally my system configuration. Upon initial installation I had the correct versions, but after reboot the libcuda version got upgraded creating a mismatch in versions. I did not initiate the upgrade of libcuda.
Failed to initialize NVML: Driver/library version mismatch
From within tensorflow (I know this is not a tensorflow problem but it helps diagnose the problem):
libcuda reported version is: 455.23.5
kernel reported version is: 450.80.2
kernel version 450.80.2 does not match DSO version 455.23.5 – cannot find working devices in this configuration
It is clear that it is a version mismatch:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
$cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.80.02 Wed Sep 23 01:13:39 UTC 2020 GCC version: gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
My system specifications:
GPU GeForce GTX 1080
Linux kernel: 5.4.0-48-generic
gcc: Ubuntu 9.3.0-17ubuntu1~20.04
nvcc: 10.1, V10.1.243
To enable GPU support, I followed the instructions in this page: 使用 pip 安装 TensorFlow. Note that I followed Ubuntu 18.04 (CUDA 10.1) even though I was installing CUDA10.1 in Ubuntu 20.04.
To re-iterate, the versions were compatible when I first installed the driver and CUDA. But after rebooting, libcuda had a different version and thus causing mis-matching.
My goal is to be able to use CUDA10.1 with tensorflow in Ubuntu 20.04. If you have suggestions on how to achieve this, it will be extremely helpful.
Thanks for your time!