Please help resolve the issue where only one of 8 GPUs supports present mode in Vulkan API

We have 8 L40S GPUs on this server.
When using the Vulkan API, only one of the 8 GPUs has a Queue family with Present mode enabled,
allowing the Vulkan API to function. The remaining 7 GPUs have Queue families where Present mode is not enabled, preventing the allocation of devices via the Vulkan API.

Although all 8 L40S GPUs are physically installed properly, it’s puzzling why only one GPU is usable, while the other 7 lack Queue families capable of presenting, rendering them unable to use the Graphic API.

I’m curious if there’s a way to resolve this issue and would like to know how

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

thanks for replying!

and now my computer is acting up right now, and I can’t upload files. I’m wondering if there’s a particular part of the log that I can copy & paste to figure out what’s wrong. If you could let me know which part, I’ll copy & paste it. If it’s too much, I can provide the log after I fix my computer.

Thank you!

-±[0000:e0]-±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14a4]
| ±00.2 Advanced Micro Devices, Inc. [AMD] Device [1022:149e]
| ±00.3 Advanced Micro Devices, Inc. [AMD] Device [1022:14a6]
| ±01.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±01.1-[e1]----00.0 NVIDIA Corporation Device [10de:26b9]
| ±02.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±03.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±04.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±05.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±05.2-[e2]----00.0 Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller [1b4b:9230]
| ±07.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| -07.1-[e3]–±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14ac]
| ±00.1 Advanced Micro Devices, Inc. [AMD] Device [1022:14dc]
| -00.4 Advanced Micro Devices, Inc. [AMD] Device [1022:14c9]
±[0000:c0]-±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14a4]
| ±00.2 Advanced Micro Devices, Inc. [AMD] Device [1022:149e]
| ±00.3 Advanced Micro Devices, Inc. [AMD] Device [1022:14a6]
| ±01.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±01.1-[c1]----00.0 NVIDIA Corporation Device [10de:26b9]
| ±02.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±03.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±03.1-[c2]----00.0 Broadcom / LSI MegaRAID SAS-3 3108 [Invader] [1000:005d]
| ±03.2-[c3]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a]
| ±03.3-[c4]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a]
| ±04.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±05.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±07.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| -07.1-[c5]–±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14ac]
| -00.1 Advanced Micro Devices, Inc. [AMD] Device [1022:14dc]
±[0000:a0]-±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14a4]
| ±00.2 Advanced Micro Devices, Inc. [AMD] Device [1022:149e]
| ±00.3 Advanced Micro Devices, Inc. [AMD] Device [1022:14a6]
| ±01.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±01.1-[a1]----00.0 NVIDIA Corporation Device [10de:26b9]
| ±02.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±03.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±04.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±05.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±07.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| -07.1-[a2]–±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14ac]
| -00.1 Advanced Micro Devices, Inc. [AMD] Device [1022:14dc]
±[0000:80]-±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14a4]
| ±00.2 Advanced Micro Devices, Inc. [AMD] Device [1022:149e]
| ±00.3 Advanced Micro Devices, Inc. [AMD] Device [1022:14a6]
| ±01.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±01.1-[81]----00.0 NVIDIA Corporation Device [10de:26b9]
| ±02.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±03.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±04.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±05.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±05.1-[82-83]–±00.0 Intel Corporation Ethernet Controller X710 for 10GBASE-T [8086:15ff]
| | -00.1 Intel Corporation Ethernet Controller X710 for 10GBASE-T [8086:15ff]
| ±07.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| -07.1-[84]–±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14ac]
| ±00.1 Advanced Micro Devices, Inc. [AMD] Device [1022:14dc]
| ±00.4 Advanced Micro Devices, Inc. [AMD] Device [1022:14c9]
| -00.5 Advanced Micro Devices, Inc. [AMD] Genoa CCP/PSP 4.0 Device [1022:14ca]
±[0000:60]-±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14a4]
| ±00.2 Advanced Micro Devices, Inc. [AMD] Device [1022:149e]
| ±00.3 Advanced Micro Devices, Inc. [AMD] Device [1022:14a6]
| ±01.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±01.1-[61]----00.0 NVIDIA Corporation Device [10de:26b9]
| ±02.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±03.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±04.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±05.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±05.2-[62-63]----00.0-[63]----00.0 ASPEED Technology, Inc. ASPEED Graphics Family [1a03:2000]
| ±07.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| -07.1-[64]–±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14ac]
| ±00.1 Advanced Micro Devices, Inc. [AMD] Device [1022:14dc]
| -00.4 Advanced Micro Devices, Inc. [AMD] Device [1022:14c9]
±[0000:40]-±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14a4]
| ±00.2 Advanced Micro Devices, Inc. [AMD] Device [1022:149e]
| ±00.3 Advanced Micro Devices, Inc. [AMD] Device [1022:14a6]
| ±01.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±01.1-[41]----00.0 NVIDIA Corporation Device [10de:26b9]
| ±02.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±03.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±04.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±05.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±07.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| -07.1-[42]–±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14ac]
| -00.1 Advanced Micro Devices, Inc. [AMD] Device [1022:14dc]
±[0000:20]-±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14a4]
| ±00.2 Advanced Micro Devices, Inc. [AMD] Device [1022:149e]
| ±00.3 Advanced Micro Devices, Inc. [AMD] Device [1022:14a6]
| ±01.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±01.1-[21]----00.0 NVIDIA Corporation Device [10de:26b9]
| ±02.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±03.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±03.1-[22]–±00.0 Mellanox Technologies MT28908 Family [ConnectX-6] [15b3:101b]
| | -00.1 Mellanox Technologies MT28908 Family [ConnectX-6] [15b3:101b]
| ±04.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±05.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| ±07.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
| -07.1-[23]–±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14ac]
| -00.1 Advanced Micro Devices, Inc. [AMD] Device [1022:14dc]
-[0000:00]-±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14a4]
±00.2 Advanced Micro Devices, Inc. [AMD] Device [1022:149e]
±00.3 Advanced Micro Devices, Inc. [AMD] Device [1022:14a6]
±01.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
±01.1-[01]----00.0 NVIDIA Corporation Device [10de:26b9]
±02.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
±03.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
±04.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
±05.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
±05.1-[02]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a]
±07.0 Advanced Micro Devices, Inc. [AMD] Device [1022:149f]
±07.1-[03]–±00.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14ac]
| ±00.1 Advanced Micro Devices, Inc. [AMD] Device [1022:14dc]
| ±00.4 Advanced Micro Devices, Inc. [AMD] Device [1022:14c9]
| -00.5 Advanced Micro Devices, Inc. [AMD] Genoa CCP/PSP 4.0 Device [1022:14ca]
±14.0 Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b]
±14.3 Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e]
±18.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14ad]
±18.1 Advanced Micro Devices, Inc. [AMD] Device [1022:14ae]
±18.2 Advanced Micro Devices, Inc. [AMD] Device [1022:14af]
±18.3 Advanced Micro Devices, Inc. [AMD] Device [1022:14b0]
±18.4 Advanced Micro Devices, Inc. [AMD] Device [1022:14b1]
±18.5 Advanced Micro Devices, Inc. [AMD] Device [1022:14b2]
±18.6 Advanced Micro Devices, Inc. [AMD] Device [1022:14b3]
±18.7 Advanced Micro Devices, Inc. [AMD] Device [1022:14b4]
±19.0 Advanced Micro Devices, Inc. [AMD] Device [1022:14ad]
±19.1 Advanced Micro Devices, Inc. [AMD] Device [1022:14ae]
±19.2 Advanced Micro Devices, Inc. [AMD] Device [1022:14af]
±19.3 Advanced Micro Devices, Inc. [AMD] Device [1022:14b0]
±19.4 Advanced Micro Devices, Inc. [AMD] Device [1022:14b1]
±19.5 Advanced Micro Devices, Inc. [AMD] Device [1022:14b2]
±19.6 Advanced Micro Devices, Inc. [AMD] Device [1022:14b3]
-19.7 Advanced Micro Devices, Inc. [AMD] Device [1022:14b4]

For a start, please post the output of nvidia-smi -q

thanks for replying.
below is what you requested.

please help to resolve this issue

this is one of our gpus and others look same.

if you need any other data, please reply us

Timestamp : Thu May 16 16:54:22 2024
Driver Version : 550.78
CUDA Version : 12.4

Attached GPUs : 8
GPU 00000000:01:00.0
Product Name : NVIDIA L40S
Product Brand : NVIDIA
Product Architecture : Ada Lovelace
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Disabled
Addressing Mode : None
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1323923001935
GPU UUID : GPU-dc56434f-c623-7283-002d-dfd931c1ae2b
Minor Number : 2
VBIOS Version : 95.02.66.00.02
MultiGPU Board : No
Board ID : 0x100
Board Part Number : 900-2G133-0080-000
GPU Part Number : 26B9-896-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G133.0242.00.03
OEM Object : 2.1
ECC Object : 6.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : N/A
Latest Duration : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU C2C Mode : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
vGPU Heterogeneous Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : N/A
GSP Firmware Version : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Base Classcode : 0x3
Sub Classcode : 0x2
Device Id : 0x26B910DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x185110DE
GPU Link Info
PCIe Generation
Max : 4
Current : 1
Device Current : 1
Device Max : 4
Host Max : 5
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P8
Clocks Event Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
Sparse Operation Mode : N/A
FB Memory Usage
Total : 46068 MiB
Reserved : 479 MiB
Used : 263 MiB
Free : 45327 MiB
BAR1 Memory Usage
Total : 65536 MiB
Used : 2 MiB
Free : 65534 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable Parity : 0
SRAM Uncorrectable SEC-DED : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable Parity : 0
SRAM Uncorrectable SEC-DED : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
SRAM Threshold Exceeded : No
Aggregate Uncorrectable SRAM Sources
SRAM L2 : 0
SRAM SM : 0
SRAM Microcontroller : 0
SRAM PCIE : 0
SRAM Other : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 192 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 36 C
GPU T.Limit Temp : 52 C
GPU Shutdown T.Limit Temp : -5 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating T.Limit Temp : N/A
GPU Power Readings
Power Draw : 23.62 W
Current Power Limit : 350.00 W
Requested Power Limit : 350.00 W
Default Power Limit : 350.00 W
Min Power Limit : 100.00 W
Max Power Limit : 350.00 W
GPU Memory Power Readings
Power Draw : N/A
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 210 MHz
SM : 210 MHz
Memory : 405 MHz
Video : 1185 MHz
Applications Clocks
Graphics : 2520 MHz
Memory : 9001 MHz
Default Applications Clocks
Graphics : 2520 MHz
Memory : 9001 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 2520 MHz
SM : 2520 MHz
Memory : 9001 MHz
Video : 1965 MHz
Max Customer Boost Clocks
Graphics : 2520 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 870.000 mV
Fabric
State : N/A
Status : N/A
CliqueId : N/A
ClusterUUID : N/A
Health
Bandwidth : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 7213
Type : G
Name : /usr/libexec/Xorg
Used GPU Memory : 108 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 7759
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 141 MiB

This is only one gpu. I really need the nvidia-bug-report.log to see what’s going on with the other 7.

thanks for your replying

It seems that I can’t upload the file because the .gz file is over 3 MB and the .log file is over 12 MB.

Could you suggest an alternative method for uploading the files?

I guess a zipped dmesg output right after a reboot would work as well. Otherwise, maybe use google drive sharing.