After launched the 8 gpu tdx vm, nvswitch raise error

zjeff · January 6, 2026, 6:36pm

After loaded the vm , nvidia-fabricmanager raise following error

[Jan 06 2026 18:33:42] [INFO] [tid 1792] Fabric Manager version 580.95.05 is running with the following configuration options
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Logging level = 4
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Logging file name/path = /var/log/fabricmanager.log
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Append to log file = 1
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Max Log file size = 1024 (MBs)
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Use Syslog file = 0
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Fabric Manager communication ports = 16000
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Fabric Mode = 0
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Fabric Mode Restart = 0
[Jan 06 2026 18:33:42] [INFO] [tid 1792] FM Library communication bind interface = 127.0.0.1
[Jan 06 2026 18:33:42] [INFO] [tid 1792] FM Library communication unix domain socket = 
[Jan 06 2026 18:33:42] [INFO] [tid 1792] FM Library communication port number = 6666
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Continue to run when facing failures = 0
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Option when facing GPU to NVSwitch NVLink failure = 0
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Option when facing NVSwitch to NVSwitch NVLink failure = 0
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Option when facing NVSwitch failure = 0
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Abort CUDA jobs when FM exits = 1
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Fabric Manager - Subnet Manager IPC socket = unix:/var/run/nvidia-fabricmanager/fm_sm_ipc.socket
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Fabric Manager - Subnet Manager management port GUID = 
[Jan 06 2026 18:33:42] [INFO] [tid 1792] Disabling RPC mode for single node configuration.
[Jan 06 2026 18:33:43] [ERROR] [tid 1792] request to query NVSwitch device information from NVSwitch driver failed with error:WARNING Nothing to do [NV_WARN_NOTHING_TO_DO]

in the dmesg, it raise following error

[   20.049438] nvidia-nvswitch: Probing device 0000:09:00.0, Vendor Id = 0x10de, Device Id = 0x22a3, Class = 0x68000 
[   20.087400] nvidia-nvswitch0: Failed to initialize device : -8
[   20.090064] nvidia-nvswitch0: Failed to initialize device : -19
[   20.092193] nvidia-nvswitch: Probing device 0000:0a:00.0, Vendor Id = 0x10de, Device Id = 0x22a3, Class = 0x68000 
[   20.127424] nvidia-nvswitch0: Failed to initialize device : -8
[   20.130935] nvidia-nvswitch0: Failed to initialize device : -19
[   20.137067] nvidia-nvswitch: Probing device 0000:0b:00.0, Vendor Id = 0x10de, Device Id = 0x22a3, Class = 0x68000 
[   20.171460] nvidia-nvswitch0: Failed to initialize device : -8
[   20.175686] nvidia-nvswitch0: Failed to initialize device : -19
[   20.182631] nvidia-nvswitch: Probing device 0000:0c:00.0, Vendor Id = 0x10de, Device Id = 0x22a3, Class = 0x68000 
[   20.215454] nvidia-nvswitch0: Failed to initialize device : -8
[   20.219718] nvidia-nvswitch0: Failed to initialize device : -19

I already

configure the 8xgpus + 4xswitch to ppcie=on and cc=off
tried several times hard reboot.
change several servers from different vendor

but the issue can not fixed still.

sbellock · January 7, 2026, 12:47am

@zjeff is this Hopper PPCIe or Blackwell MPT? Use of fabric manager in host / guest is different based on GPU generation.

Topic		Replies	Views
Fabricmanager: NVSwitches found. dcgmi: NVSwitches not found CUDA Setup and Installation cuda	2	898	October 15, 2024
Ppcie mode nvswich not working in cvm Confidential Computing nvidia-smi	8	256	July 24, 2025
Nvidia-fabricmanager Error on H100 SXM: received NVLink inband message arrived on an NVLink xx which is not part of any active partition InfiniBand/VPI Switch Systems hw , nvbugs , ai	1	673	December 17, 2024
ERROR: The NVIDIA Driver is present, but CUDA failed to initialize. [[ System not yet initialized (error 802) ]] Mellanox OFED	1	175	December 16, 2025
CUDA initialization error on 8x A100 GPU HGX server CUDA Setup and Installation	7	7567	November 4, 2023
Failed to start nvidia-fabricmanager.service on centos8 DGX Systems (Data Center) cuda , nvbugs , python	0	296	July 31, 2024
HGX 8GPU A100 (80G) NVlink systems show NVlink Fatal Error after NVSwitch Temperature Linux pcie , hw , cuda , kernel , ubuntu , power , pytorch , python	1	318	November 4, 2024
Fabric Manager Installation CUDA Setup and Installation	3	11323	March 20, 2024
Problem starting fabricmanager in Ubuntu 20.04 LTS CUDA Setup and Installation	9	11428	December 20, 2024
HGX 8GPU A100 (80G) NVlink systems show NVlink Fatal Error after NVSwitch Temperature CUDA Programming and Performance	1	476	July 16, 2024

After launched the 8 gpu tdx vm, nvswitch raise error

Related topics