Tesla p100 hardware incompatability?

keithhbova · July 19, 2023, 8:14pm

Hello, I hope someone can help me. I have spent several days on this issue with no luck. I am trying to install the cuda toolkit for a tesla p100 gpu. My hardware is as follows:

Motherboard: tyan s8225 Motherboards S8225 S8225AGM4NRF

Cpu: amd opteron 4284 (x2)

Ram: 128gb (16x8) ecc ddr3 1333 mhz

The motherboard works great in ubuntu with geforce graphics (i’ve tested a 2060, 3060ti, 3060 12gb, 2080ti). I’ve also tested the 2080ti in centos, and it works fine as well. Never had an issue installing the cuda toolkit or using tensorflow etc. But for some reason, I absolutely cannot get a p100 to communicate with nvidia-smi.

I’ve tried centos stream 9, centos stream 8, and now I’m on centos stream 7.

I feel like I must be doing something wrong or I must be missing something. I am using centos 7 “workstation” install, with all the optional dependencies. I have secure boot disabled and verified gcc 4. SSH is enabled.

lspci | grep nvidia returns

06:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)

After I boot up the system, I blacklist nouveau with:

#!/bin/bash
if [[ $EUID -ne 0 ]]; then
    echo "This script must be run as root."
    exit 1
fi

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut -v /boot/initramfs-$(uname -r).img $(uname -r)
dracut -f
sudo reboot

I then switch to run level 3

sudo init 3

And install cuda with:

#!/bin/bash
sudo yum update -y
sudo yum install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum install -y nvidia-driver-latest-dkms.x86_64
sudo yum install -y cuda
sudo yum install -y cuda-drivers
sudo reboot

I verify the driver install worked fine, by reading the /var/log/nvidiainstaller and testing

nvcc --version

but for some reason,

nvidia-smi

returns: No devices found.

Does anyone have any advice? I’m using the network install because the run file gives me an error with the nvidia driver install: missing kernel module: “nvidia.ko”. I’ve tried other places but I cannot find much documentation about the p100.

I’m trying to get these cards working to training deep learning models for medical research.

Thanks in advance

Topic		Replies	Views
Unable to install Cuda 8.0 for GP100GL [Tesla P100 PCIe 16GB] (rev a1) enterprise redhat linux 7.6 Linux	8	900	May 23, 2018
Nvidia-smi reports "no device" for a V100 GPU on IBM cloud Linux cuda , nvidia-smi	7	1067	April 30, 2024
CUDA 10.0 - no CUDA-capable device is detected, nvidia-smi does not work. CUDA Setup and Installation	0	2380	April 24, 2019
Nvidia-smi No devices were found CentOS7.9 Tesla P100 General Topics and Other SDKs	0	639	August 9, 2023
A100 Nvidia-smi fails Ubuntu 22.04 Linux ubuntu , nvidia-smi , a100	3	801	June 3, 2024
Ubuntu 22.04.3 LTS Server, Tesla P100, Driver Version: 470.199.02, CUDA Version: 11.4 CUDA Setup and Installation	3	3025	August 19, 2023
No GUI after install the Nvidia tesla V100 CUDA Setup and Installation	2	1098	April 3, 2018
No GUI after install the Nvidia tesla V100 Linux	1	728	April 3, 2018
Need Help with P100 installation (R730 Dell) CUDA Setup and Installation	8	1654	August 18, 2023
Problem Installing Drivers on Ubuntu 20.04 using: nvidia-driver-455, on Lenovo T490 with MX250 dGPU Linux ubuntu	5	4834	October 12, 2021

Tesla p100 hardware incompatability?

Related topics