Tesla K80 one GPU visible

Hello,

I recently acquired several Tesla K80s. I have a system capable of running these cards so I set up some benchmarking tools to see if they are working properly. Most of the cards when installed show up as 2 devices in device manager and 2 GPUs are visible in nvidia-smi. Benchmarks on these cards work without a hitch. Some of the cards are only showing up once in device manager and nvidia-smi. I am only able to benchmark the single visible GPU. Are those cards faulty or is there something I am missing?

Thank you

If you’ve provided sufficient power and cooling, and you have not attempted to put too many devices in the system, then a K80 should show up as two logical devices (e.g. two devices enumerated in lspci, for example). It’s impossible to say if your GPUs are faulty from the information provided.

I only test one card at a time to avoid any conflicts. When I put in a ‘good’ card, everything runs as it should. Temps are good as well. Both GPUs are visible and I am able to benchmark both GPUs separately and simultaneously. When I swap in a ‘bad’ K80 by itself, only 1 GPU is visible and I can only benchmark the visible GPU. Is there any other information that could be helpful for you?

Thank you

In the failing case:

lspci |grep -i nvidia

dmesg |grep NVRM

In which condition did you buy them? New or used? From whom did you buy? A reputable dealer, or some random person on an internet flea market?

Clearly we should not jump to conclusions, but it seems possible that the card is defective based on the information given so far (testing one K80 at a time, just one has only one GPU active). I would consider that more likely if you bought the K80 used and / or from some fly-by-night operation.

Were the cards delivered in antistatic bags, and did you take precautions against electrostatic discharge when installing them?