Plugging Tesla K80 results in PCI resource allocation error

UberSycko · March 6, 2015, 9:57pm

Hi,

I bought a Tesla K80 card and tried to integrate it into a workstation PC (of course with sufficient ventilation). I already tried the card in combination with 3 different motherboards but no combination worked. The error I get from my Asus Rampage IV is:

D4: PCI resource allocation error

The system does not startup with the K80 plugged. After unplugging, everything runs fine. I already removed every other PCI card, but that did not make any difference.
On another motherboard (do not have the concrete type yet) the system started, but after installing the driver in windows the device manager states: Unsufficient resources available. Unplug another component to make it work. (free translation from German).
The third motherboard did not even start up - power good signal did not seem to come.

I tried to find some answers in these forums, unfortunately without success.

Any help appreciated!

Robert_Crovella · March 6, 2015, 10:15pm

This is to be expected on most workstation or consumer motherboards.

K80 (and K40) have large PCI BAR regions that need to be mapped into system PCI space. Many system BIOSes choke on this process, and it is one of the reasons that the normal way to deploy tesla products is in a OEM-qualified server.

You might have better luck with a specific ASUS workstation class motherboard or a specific Asrock workstation class motherboard, but I can’t really give you any guarantees or certifications.

Here’s an example of a thread on this board that has related information:

[url]https://devtalk.nvidia.com/default/topic/790759/cannot-install-driver-for-nvidia-tesla-k40-cards-on-fedora-20/[/url]

njuffa · March 6, 2015, 10:22pm

I assume you are fully aware that what you are attempting to do is completely unsupported and you are proceding entirely at your own risk. My take on the situation is that you are risking permanent damage to a very expensive device. I would strongly advise looking for a ready-made server solution that comes with a properly integrated K80 GPU. Note that GPUs including the K80 plug into PCIe connectors, which are different from PCI connectors.

The following comments are provided for entertainment value only. I have not actually used a K80. Your description hints at two unrelated problems.

The fact that the system won’t start with the K80 plugged in would seem to indicate either that the GPU is not correctly supplied with power (are all relevant PCIe power connectors plugged in?), that you are overloading the system’s power supply (maybe just on a particular rail), or that the GPU is overheating. Other explanations are possible, such as creating an electrical short somewhere.

The second issue seems to be a lack of system resources, possibly insufficient size of a memory aperture needed by the system to communicate with the GPU, or conflict on an interrupt line, etc. This may be a question of your BIOS configuration, or your system infrastructure may be too old to provide the necessary resources. You would probably want to use a Haswell-based system or at least an IvyBridge-based system to try this with.

UberSycko · March 6, 2015, 10:33pm

Damn, that is kind of interesting - and a littlebit scary, too. Unfortunately there is not much public information about the Teslas available, so I did not even know about this >4G problem. So I understand your advice would be to search (at least) for a certified motherboard, right?
njuffa, could you give me a hint what mean by “Note that GPUs including the K80 plug into PCIe connectors, which are different from PCI connectors.”? Do you refer to the power connectors? I noticed these were different from the old GPU connectors, but seem to be the same as modern CPU connectors.

njuffa · March 6, 2015, 10:40pm

I wasn’t referring to the PCIe power connectors (6-pin rated at 75W each, or 8-pin rated at 150W each). I was referring to the edge connectors carrying the signals between the system and the device. These are mechanically and electrically different between PCI and PCIe. Your post only referred to PCI, which seemed odd and, together with the other contents of your post, raised a red flag in my mind.

UberSycko · March 6, 2015, 10:43pm

Ah, okay, sorry for being unprecise. Of course I meant PCIe. Thanks for your explanations.

Gert-Jan · March 9, 2015, 9:04am

Buyers of the Xeon Phi 31S1P (currently on sale by Intel) have the same kind of problems. Finding a matching motherboard is challenging , see for example Will your motherboard work with Intel Xeon Phi? and Xeon Phi 31s1p Cooling and Motherboards. One board known to work with the Xeon Phi is the Asus P9X79 WS. Also the Asus X99-A has the K40C listed in its Qualified Vendors List, so it might work with a K80 as well. As always YMMV, but hey, YOLO.

Gert-Jan · March 9, 2015, 9:04am

SergeCell · March 10, 2015, 7:36am

Most of modern motherboards have option “Above 4G decoding” in bios (or something similarly named), and it should be enabled for tesla. It’s not enabled by default. It could be difficult to find, it’s usually buried in the advanced PCI settings.
We have asus X99-S motherboard and managed to get tesla k80 running (was not easy, especially cooling)

cbuchner1 · March 10, 2015, 3:10pm

If anyone else has information about affordable motherboards with “large BAR” or “Above 4G Decoding” support, I am all ears. I own one actively cooled Intel Xeon Phi board with 6GB RAM but haven’t had the opportunity to get it running yet.

Christian

vtrandal · April 22, 2015, 6:48pm

Hey txbob, I’m benefiting from your comments on this topic. I had to look up BAR. In this context it probably means Base Address Register which makes sense. These GPU cards need a large contiguous block of memory pointed to by an address in a Base Address Register. My memory was jogged by looking up BAR here http://www.geocities.jp/technoart_jp/Abbreviation/abbrev_e_COM.html

Robert_Crovella · April 22, 2015, 7:10pm

Yes, BAR is Base Address Register

A google of “PCI BAR” will take you to the wikipedia page (first hit) which adequately defines PCI BAR. Each BAR corresponds to a Memory or I/O resource (i.e. range) that the device needs mapped into the system memory space. The size of the BAR (i.e. the size of the region required to be mapped) is easily discoverable by the PCI BIOS via a simple write/read sequence, and when properly mapped, the BAR should contain (i.e. the PCI BIOS should set) the base address of the mapped region.

DBarnett · June 19, 2015, 2:47pm

SurgeCell, Could you give some indication about what you had to do to sufficiently cool your K80 and how you are monitoring the card temps? I installed one into a system with an ASUS P9X79-E-WS mobo and got it to solve an ANSYS test simulation but do not have a good way to monitor it yet. The first system that I tried is almost identical but has a P9X79-WS mobo and it failed to boot after I enabled the above 4G decoding. I still want to use this system and I am waiting to try a newer version of the BIOS after our supplier gives it to us.

mjarpaio · August 31, 2020, 12:36pm

Hello @DBarnett and @SergeCell cna you please share the procedure and configuration used to have a Tesla K80 working on a Workstation ? I am currently using workstation class MB AsRock Extreme 4 , X99 with Xeon processor and 128 GB DDR4 RAM. Thanks !

supahstaahh · March 16, 2021, 3:04am

The Tesla k80 requires two 8 pin power connectors linked into a single 8 pin connector. They can be found here https://www.amazon.com/dp/B07SHT579T?ref=ppx_pop_mob_ap_share

Each 8 pin provides around 125watts. The k80 is 300. 75 watts from the PCIE slot and 225 from the external pci 8 pins. Originally these cards would come with those adapters.

I had the same issue with my k80. Start when unplugged until I connected it to correct power voltage.

Topic		Replies	Views
Tesla K80 Installation Issue CUDA Setup and Installation	2	1900	August 31, 2020
Driver Installation for Tesla K80 - Problems CUDA Setup and Installation	17	6126	January 18, 2020
Troubleshooting Tesla K80 on Dell PowerEdge R810 running Ubuntu 20.04 CUDA Setup and Installation cuda , ubuntu	1	1293	February 15, 2021
Driver Installing Problem for NVIDIA Tesla K80 under Linux CUDA Programming and Performance	10	20002	August 16, 2015
Tesla K80 stopped working CUDA Setup and Installation	17	4729	November 12, 2023
Tesla K80 Code 12 Error CUDA Programming and Performance	2	2221	December 29, 2021
K40 setup on Lenovo P510 CUDA Setup and Installation	22	6908	July 26, 2023
NVRM: This PCI I/O region assigned to your NVIDIA device is invalid Linux	39	16729	October 12, 2021
Is it possible to use all 40 PCI-E lanes on X79 workstation with Tesla S1070? CUDA Setup and Installation	4	2146	September 3, 2014
K80 crashed or wrong computation results on K80 CUDA Programming and Performance	13	4941	September 20, 2015

Plugging Tesla K80 results in PCI resource allocation error

Related topics