Tesla P100 SXM2 GPU on power8 `nvidia-smi -q` can not be found

ezhao963 · August 10, 2018, 10:39am

Hi,

I have one IBM power8 server installed with Tesla P100 GPU.

The OS is rhels7.4 ppc64le GA

cat /etc/os-release

NAME=“Red Hat Enterprise Linux Server”
VERSION=“7.4 (Maipo)”

uname -r

3.10.0-693.el7.ppc64le

But, I can not enable both CUDA9.0 and CUDA9.2 on it.

rpm -aq |grep dkms

dkms-2.2.0.3-30.git.7c3e7c5.el7.noarch ==> I can only install successfully for cuda-9.0 with dkms-2.2.0. Other version like 2.5.0 or 2.6.6 have error like /var/lib/dkms/nvidia/384.81/build/common/inc/nv-mm.h: error get_user_pages

rpm -aq |grep cuda-driver

cuda-driver-dev-9-0-9.0.176-1.ppc64le
cuda-drivers-384.81-1.ppc64le ==> The driver is 384.81

But there is NVRM error in dmesg:

dmesg | grep NVRM

[ 2.801928] NVRM: loading NVIDIA UNIX ppc64le Kernel Module 384.81 Sat Sep 2 00:45:52 PDT 2017 (using threaded interrupts)
[ 65.466508] NVRM: RmInitAdapter failed! (0x25:0x5b:1080)
[ 65.466685] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 65.824117] NVRM: RmInitAdapter failed! (0x25:0x5b:1080)
[ 65.824236] NVRM: rm_init_adapter failed for device bearing minor number 1
[ 66.217278] NVRM: RmInitAdapter failed! (0x25:0x5b:1080)
[ 66.217354] NVRM: rm_init_adapter failed for device bearing minor number 2

No GPU can be found

nvidia-smi -q

No devices were found

lspci |grep NVIDIA

0002:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
0003:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
0006:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)

Is there any tips what I can do next? Thx a lot!

ezhao963 · August 16, 2018, 7:04am

Any tips? The problem have blocked for weeks. Is there anybody can help? Thanks a lot!

Robert_Crovella · August 18, 2018, 2:40pm

I would start with a clean load of the OS, get your CUDA and driver installers here:

[url]http://www.nvidia.com/getcuda[/url]

and follow the instructions here:

[url]Installation Guide Linux :: CUDA Toolkit Documentation

carefully. I suggest reading the whole document first. Note that there are power9 specific steps.

Topic		Replies	Views
Unable to install Cuda 8.0 for GP100GL [Tesla P100 PCIe 16GB] (rev a1) enterprise redhat linux 7.6 Linux	8	920	May 23, 2018
Having issues after installing cuda drivers on a RHEL 7 VM with P40 GPU CUDA Developer Tools	0	422	June 11, 2020
460 driver installation on Tesla 2070 Linux	1	637	March 8, 2021
[solved] Titan X Pascal with Cuda 8.0 on Ubuntu 16.04 CUDA Setup and Installation	3	2870	February 17, 2017
Unable to load driver for Tesla M60 - HP Proliant DL580 gen 9 Linux	1	322	November 17, 2023
Unable to load NVIDIA System Management Interface due to Driver on RHEL 8 CUDA Developer Tools	0	341	December 21, 2020
Tesla P40 in Dell Percision 7910 rack CUDA Programming and Performance	10	2443	February 16, 2024
Unable to set up cuda-8.0 on RHEL 7.4 CUDA Setup and Installation	5	2027	February 1, 2018
dkms problem while installing CUDA 8 on RHEL 7.3 ppc64le CUDA Setup and Installation	2	9423	February 12, 2017
Driver/CUDA will not install - Telsa K80 - PowerEdge C4130 Linux	1	823	September 14, 2016