Install Cuda 10.0 on Ubuntu16.04 (for DGX-1)

herbol87 · September 22, 2019, 6:47am

Hi All,

I am trying to install CUDA-10.0 on Ubuntu 16.04 running on DGX-1 server.
I followed the instructions for “runfile installation” in https://docs.nvidia.com/cuda/archive/10.0/cuda-installation-guide-linux/index.html#runfile.

After step 4.2.6 (i.e. Reboot the system to reload the graphical interface.), I checked the CUDA version as follows:

nvcc --version

which returns:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

However, when I run:

nvidia-smi

it returns:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I went to step 4.4 (Device Node Verification.), and found that the device files /dev/nvidia* don’t exist.
I tried to create them manually, however, running:

sudo /sbin/modprobe nvidia

returns:

modprobe: ERROR: could not insert 'nvidia': Exec format error

Please help to solve the problem. Thanks!

Other details.

lspci | grep -i nvidia
06:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
07:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
0a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
0b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
85:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
86:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
89:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
8a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)

uname -m && cat /etc/*release
x86_64
DGX_NAME="DGX Server"
DGX_PRETTY_NAME="NVIDIA DGX Server"
DGX_SWBUILD_DATE="2018-03-20"
DGX_SWBUILD_VERSION="3.1.6"
DGX_COMMIT_ID="1b0f58ecbf989820ce745a9e4836e1de5eea6cfd"
DGX_SERIAL_NUMBER=QTFCOU8280021
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

gcc --version
gcc (GCC) 5.4.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

uname -r
4.4.0-142-generic

cat /proc/version
Linux version 4.4.0-142-generic (buildd@lgw01-amd64-033) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) ) #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019

dpkg -l | grep nvidia
ii  dgx-peer-mem-loader                             1.1-10                                        amd64        Ensure nvidia is loaded before nv_peer_mem

Robert_Crovella · September 24, 2019, 1:52am

DGX-1 software is mostly maintained and installed via package manager systems. You can use a runfile installer, but you’ll need to be aware of the conflicts that are inherent. These conflicts are documented in the CUDA linux install guide in the section “handle conflicting install methods”.

In short, CUDA 10 toolkit is installed, but your driver install is broken. You’ll need to clean up and remove all installation history, to rectify this.

Topic		Replies	Views
Cuda Installation on Ubuntu 18.04 Failing CUDA Setup and Installation	8	2807	March 26, 2020
Cuda 10.0 install claims missing driver, but it is installed. CUDA Setup and Installation	6	2501	May 24, 2019
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. Linux	2	5762	August 16, 2019
NVIDIA-SMI no longer works and fresh nvidia-driver installs fail CUDA Setup and Installation cuda , ubuntu	1	1777	January 16, 2024
Kernel Update breaks CUDA/nvidia-smi CUDA Setup and Installation	15	9245	April 23, 2019
CUDA 10 installation on Ubuntu 16.04 CUDA Setup and Installation	0	1937	January 4, 2019
CUDA 10 installation problems on Ubuntu 18.04 CUDA Setup and Installation	24	94590	December 11, 2020
Unable to set up cuda-8.0 on RHEL 7.4 CUDA Setup and Installation	5	2025	February 1, 2018
Cuda10 installing problem, nvidia-smi is not working CUDA Setup and Installation	1	4784	December 27, 2019
Cuda 10.0 on Ubuntu16.04 (cuda-repo contains dysfunctional nvidia driver) CUDA Setup and Installation	1	1403	March 13, 2022

Install Cuda 10.0 on Ubuntu16.04 (for DGX-1)

Related topics