NVIDIA P40; Dell Workstation; Ubuntu 20.04; Drivers don't work, Kernel errors, possible PCIe configuration problem

a.t.bolsh · April 14, 2022, 4:26pm

Hello all! The TLDR is that I’m trying to set up a personal rig with a Tesla P40 I was able to buy cheaply, for hobby AI projects (I was recently a grad student doing this research, but I chose to leave and downgrade my AI involvement to a hobby). I bought a DELL Optiplex 7020 Minitower, installed Ubuntu on it, and was able to see the card using lspci ; however, no matter what I do, I cannot get the drivers to run, and I’m getting kernel errors. Reading up on the topic, this seems to be a common occurrence with high-grade GPUs and Dell machines, specifically having to do with PCIe configurations, but the standard fixes don’t seem to work for me.

I basically have 2 questions:

If there is a way to make the two play together (and how do I do it)?
If the two have incompatible PCIe expectations, what machines are known to WORK with the Tesla P40? How can I buy a box that won’t have these issues?

More details:

Here are the specs for the Dell box: Amazon.com

I installed Ubuntu 20.04, and I see the P40 when I run lspci .

From there, I followed the Linux CUDA installation guide. CUDA Installation Guide for Linux — Installation Guide for Linux 13.2 documentation

The final tests don’t work, and nvidia-smi says it can’t detect a device. I know where the problem is: the drivers I install won’t run, no matter what. I’ve tried installing several different available nvidia drivers, that I saw by running

ubuntu-drivers devices

(The reason I know the drivers aren’t running is that the folder /proc/driver/nvidia doesn’t exist, and everything I’ve read has suggested that )

Specifically, the bug report below was generated when I had nvidia-driver-510-server, but I tried the non-server option, the server option, I tried 495, 470, everything.

nvidia-bug-report.log.gz (1.3 MB)

Looking into logs (specifically /var/log/syslog ) I see a lot of the following kernel errors:

4 Apr 13 17:11:33 penguins-army kernel: [    6.737243] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
    5 Apr 13 17:11:33 penguins-army kernel: [    6.737249] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
    6 Apr 13 17:11:33 penguins-army kernel: [    6.737249] NVRM: BAR1 is 0M @ 0x0 (PCI:0000:04:00.0)
    7 Apr 13 17:11:33 penguins-army kernel: [    6.738488] nvidia: probe of 0000:04:00.0 failed with error -1
    8 Apr 13 17:11:33 penguins-army kernel: [    6.738508] NVRM: The NVIDIA probe routine failed for 1 device(s).
    9 Apr 13 17:11:33 penguins-army kernel: [    6.738509] NVRM: None of the NVIDIA devices were initialized.
   10 Apr 13 17:11:33 penguins-army kernel: [    6.738670] nvidia-nvlink: Unregistered the Nvlink Core, major device number 234

Googling this error, I see the following NVIDIA developer forum posts, which suggest this is a PCIe configuration problem, though it’s talking about a different Dell box and a different GPU:

However, I can’t seem to find those settings in the BIOS for my machine. Not sure why.

I also found this forum, which suggests that either NVIDIA or Dell frustratingly tried to protect users from themselves, and deliverately made datacenter GPUs incompatible with workstations:

If that’s the case (and I’m not sure it is; the replies to that post seem to deal with overheating, which is different from what I’m dealing with, since I’m not loading the GPU at all), I don’t mind buying a server, but I don’t want to be disappointed again (I’ve killed so much time digging through these issues outside my usual competency). What servers are known to play well with this card? What PCIe settings can I google to check that the two would be compatible?

Thank you very much for your time.

rs277 · April 14, 2022, 7:30pm

Table 7 of this document lists P40 supported servers:

I doubt there is any deliberate intent. More a case of the card, being designed to fit efficiently into an environment that is not normally catered to by run of the mill PC hardware.

In addition to larger than normal PCIe BAR requirements, this card has no cooling fan fitted as it’s intended to be installed in a chassis designed to ensure that air is forced through it at the required rate. This can be done outside of the approved systems, but requires a degree of hands on and research.

a.t.bolsh · April 14, 2022, 10:47pm

Thanks a lot for the link; while I’m still trying to fix my system, I’ll look through it, find the cheapest servers.

As for the fan, I bought a CPU fan and added a 3d-printed air funnel :)

Do you know of any older versions of the BIOS for this machine that I can flash onto the system? Maybe there’s an alternative BIOS where these lower-level functions are exposed. I just have no idea how to search for these things.

rs277 · April 14, 2022, 11:11pm

Sorry, no.

Did similar with a Tesla K20X, but used a fairly beefy blower style fan. Am not sure a CPU fan will move enough air, but you’ll get an idea from monitoring temps.

For some reason the Product Brief for the P40 make no mention of it, but the P100 one lists airflow requirements, also uses 250W, and is of similar construction.

a.t.bolsh · April 15, 2022, 1:30pm

Yeah . . . I’d love for the fan to be my primary problem, especially since I know to look for it and it has solutions. Thanks again.

generix · April 17, 2022, 2:53pm

If those are reallly all bios settings
https://www.dell.com/support/manuals/de-de/optiplex-7020-desktop/opt7020sffompub-v1/system-setup-optionen?guid=guid-212377c4-3cc4-4633-86c6-53ab926b50fc&lang=en-us
then it’s really limited for a “Workstation”. Just to make sure, you’re using EFI boot and have legacy boot disabled, virtualization enabled? Please uninstall the nvidia driver and provide a dmesg output right after boot.

a.t.bolsh · April 17, 2022, 6:31pm

Hey, I returned the workstation and bought the cheapest server from rs277 's Table 7 (Lenovo System x3650 M5), which was about the same price anyway (the market for these is weird; you can get steals sometimes). But yes, it was UEFI boot, virtualization enabled, and I think legacy boot had been disabled, though I’m not sure.

There had been other issues - for instance, the power supplies were separate, so I turned on the power supply for the GPU (and its fan) before turning on the workstation; I read that this shouldn’t be an issue, though it might have been. I’ll try again with an officially supported configuration, since it’s not much more expensive.

Topic		Replies	Views
Tesla P40 in Dell Percision 7910 rack CUDA Programming and Performance	10	3205	February 16, 2024
Dell Precision 7820 + Linux Mint 21.3 (6.8.* Kernel): Tesla P40 vs Quadro P6000 Linux linux	1	408	August 16, 2025
Issues on installing drivers for Tesla P40 CUDA Programming and Performance	5	3130	August 26, 2021
I just cant get P40 working on Ubuntu Server Linux	0	735	July 6, 2024
Installing Tesla P100 on Ubuntu 16.04 Server with a 1060GTX CUDA Setup and Installation	5	3351	April 20, 2017
P40 not detected by bios CUDA Setup and Installation	2	1018	July 21, 2022
Dell workstation one P2000 and the other is Telsa M40 Linux	6	1109	April 21, 2019
Computer boot loops when Nvidia Tesla P100 is installed CUDA Setup and Installation	1	1125	October 2, 2023
CUDA driver behaves differently on ubuntu 18.04 desktop and server CUDA Setup and Installation	3	791	April 14, 2019
Tesla K40m not recognized, but driver works find for Quadro K420 CUDA Setup and Installation	3	1458	June 22, 2016

NVIDIA P40; Dell Workstation; Ubuntu 20.04; Drivers don't work, Kernel errors, possible PCIe configuration problem

Related topics