One of the 4 GPUs (GeForce RTX 2080 Ti) does not show up on nvidia-smi

Berriel · February 25, 2019, 4:03pm

A new server just arrived and I proceeded to install Ubuntu 16.04 and CUDA+cuDNN as usual. After installing everything, one of the GPUs is missing from nvidia-smi. All 4 appear on lspci:

19:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
1a:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
67:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev ff)
68:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)

When I run nvidia-smi, this message appears on dmesg:

[    8.672022] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000c7fff window]
[    8.672138] caller os_map_kernel_space.part.7+0xd8/0x120 [nvidia] mapping multiple BARs
[   11.860957] NVRM: RmInitAdapter failed! (0x26:0xffff:1125)
[   11.860979] NVRM: rm_init_adapter failed for device bearing minor number 2

If I let the server running for some time, the nvidia-smi then throws:

Unable to determine the device handle for GPU 0000:67:00.0: Unknown Error

I attached two bug reports: one before the nvidia-smi “breaks” and another one after. Any help is welcome.
nvidia-bug-report-after.log.gz (2.06 MB)
nvidia-bug-report-before.log.gz (1.87 MB)

generix · February 25, 2019, 4:55pm

Some kind of hardware failure, maybe the card is just improperly seated or power connector missing/improperly connected. If reseating/checking power connectors doesn’t help, test card in another system for general hardware failure.

Berriel · February 27, 2019, 3:08pm

Indeed, after many attempts, I could identify that one of the GPUs was faulty but only after loading the NVIDIA driver. On Windows, what happened was that it would fallback to the generic driver so that the faulty GPU could be used when the display was connected to it; otherwise, same thing: it was reporting that something was wrong with one of the GPUs. Let’s see if we manage to RMA it.

Topic		Replies	Views
RTX 2080ti -- No devices found when running nvidia-smi Linux hw	3	1629	July 22, 2021
NVIDIA-SMI not show full graphic cards CUDA Setup and Installation	0	428	July 15, 2020
nvidia-smi can't find a 2080Ti on ubuntu 18 Linux	1	888	November 30, 2019
Not seeing all the GeForce RTX 2080 Ti GPUs when running nvidia-smi in Ubuntu 18.04 LTS Server Linux	3	1096	April 4, 2019
NVIDIA-SMI just shows one GPU instead of two Linux	4	3168	March 15, 2019
GPU not being recognized - No devices were found returned by nvidia-smi Linux	15	6593	April 4, 2023
5 out of 8 GPUs are not detected with nvidia-smi GPU - Hardware nvidia-smi	3	1227	March 31, 2023
Nvidia-smi output: No devices were found Drivers - Linux, Windows, MacOS linux , driver , nvidia-smi , linux-driver-solutions	1	9372	March 1, 2022
Only 3 GPUs are detected and 4th one isn't detected Linux	6	722	October 19, 2022
GPU driver doesnt work properly on Ubuntu 24.04 LTS Linux ubuntu	8	14727	June 19, 2024

One of the 4 GPUs (GeForce RTX 2080 Ti) does not show up on nvidia-smi

Related topics