P100 not showing up in nvidia-smi

Hi,

I was using a NVIDIA Titan X card on a computer and it was working fine, but when I changed the card to NVIDIA Tesla P100 the card does not show up in nvidia-smi. I updated the drivers to 375.51.

Output of lspci | grep NVIDIA

0f:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1)
0f:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1)
42:00.0 3D controller: NVIDIA Corporation Device 15f8 (rev a1)

Output of dmesg |grep NVRM

[   22.298530] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
[   22.298530] NVRM: BAR1 is 0M @ 0x0 (PCI:0000:42:00.0)
[   22.298532] NVRM: The system BIOS may have misconfigured your GPU.
[   22.298565] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   22.298567] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.51  Wed Mar 22 10:26:12 PDT 2017 (using threaded interrupts)
[   27.468189] NVRM: Your system is not currently configured to drive a VGA console
[   27.468192] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
[   27.468193] NVRM: requires the use of a text-mode VGA console. Use of other console
[   27.468194] NVRM: drivers including, but not limited to, vesafb, may result in
[   27.468195] NVRM: corruption and stability problems, and is not supported.

There is your problem. Check your BIOS setup to see whether you can dial in a BAR0 aperture of the required size. You may need to install the latest system BIOS for your platform for this to work, or your system BIOS may not support this at all.

What is your host system (maker, model)? Is this a server enclosure that can provide sufficient forced air flow to cool the passively cooled Tesla P100? If not, the GPU will overheat quickly and shut itself down to prevent permanent damage. Or is this an actively cooled Quadro GP100 by any chance?

Are there any specific or minimum hardware requirements to make it work ?

With modern Tesla GPus, the “safe” and (as far as I can perceive NVIDIA’s intentions) intended approach is that customers buy them already integrated into a system from an integrator that has partnered with NVIDIA. The integrators are aware of all the technical issues that use of Tesla GPUs entails. NVIDIA provides a handy list of integrators here: http://www.nvidia.com/object/where-to-buy-tesla.html

If you build your own home-brew system with a Tesla GPU, you are pretty much on your own. Numerous posts in these forums demonstrate that people run into problems doing that. I would assume most of them do not have a background in building and configuring HPC systems.

While I am tangentially familiar with some of the more common issues that arise when adding a Tesla GPU to a system (such as the requirement for a large BAR0 aperture), I am neither familiar with the P100 nor do I know your motherboard or server system. You can poke around in your system BIOS setup to see what options if provides for setting up the aperture. And make sure you provide proper cooling for the P100.

Thanks for the help, I have systems with the following three types of motherboard-cpu combination:

  • Asus MAXIMUS VIII HERO with Intel(R) Core(TM) i7-6700K CPU
  • Asus X99-E WS with Intel(R) Xeon(R) CPU E5-2620 v3
  • HP 0AECh with Intel(R) Xeon(R) CPU X5690

Will it work with any of the above configurations ?

I have encountered the same problem as you. How can you solve this problem?

Order the P100 in an OEM server, from a OEM that has designed the system to support P100.