Hi,
I have one IBM power8 server installed with Tesla P100 GPU.
The OS is rhels7.4 ppc64le GA
cat /etc/os-release
NAME=“Red Hat Enterprise Linux Server”
VERSION=“7.4 (Maipo)”
uname -r
3.10.0-693.el7.ppc64le
But, I can not enable both CUDA9.0 and CUDA9.2 on it.
rpm -aq |grep dkms
dkms-2.2.0.3-30.git.7c3e7c5.el7.noarch ==> I can only install successfully for cuda-9.0 with dkms-2.2.0. Other version like 2.5.0 or 2.6.6 have error like /var/lib/dkms/nvidia/384.81/build/common/inc/nv-mm.h: error get_user_pages
rpm -aq |grep cuda-driver
cuda-driver-dev-9-0-9.0.176-1.ppc64le
cuda-drivers-384.81-1.ppc64le ==> The driver is 384.81
But there is NVRM error in dmesg:
dmesg | grep NVRM
[ 2.801928] NVRM: loading NVIDIA UNIX ppc64le Kernel Module 384.81 Sat Sep 2 00:45:52 PDT 2017 (using threaded interrupts)
[ 65.466508] NVRM: RmInitAdapter failed! (0x25:0x5b:1080)
[ 65.466685] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 65.824117] NVRM: RmInitAdapter failed! (0x25:0x5b:1080)
[ 65.824236] NVRM: rm_init_adapter failed for device bearing minor number 1
[ 66.217278] NVRM: RmInitAdapter failed! (0x25:0x5b:1080)
[ 66.217354] NVRM: rm_init_adapter failed for device bearing minor number 2
No GPU can be found
nvidia-smi -q
No devices were found
lspci |grep NVIDIA
0002:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
0003:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
0006:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
Is there any tips what I can do next? Thx a lot!