I am new and I am having CUDA initialization error when I tried to set up my first 8x A100 GPU HGX server(running RHEL7.9). can’t find “nvswitches”. Could you please advise how I can troubleshoot and fix the problem? Thank you so much!!
./deviceQuery
./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 3
→ initialization error
Result = FAIL
nvidia-smi
Sat Apr 22 10:18:26 2023
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:07:00.0 Off | 0 |
| N/A 31C P0 61W / 400W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:0A:00.0 Off | 0 |
| N/A 29C P0 61W / 400W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:44:00.0 Off | 0 |
| N/A 29C P0 59W / 400W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:4A:00.0 Off | 0 |
| N/A 32C P0 60W / 400W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 4 NVIDIA A100-SXM4-80GB On | 00000000:84:00.0 Off | 0 |
| N/A 31C P0 51W / 400W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 5 NVIDIA A100-SXM4-80GB On | 00000000:8A:00.0 Off | 0 |
| N/A 29C P0 61W / 400W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 6 NVIDIA A100-SXM4-80GB On | 00000000:C0:00.0 Off | 0 |
| N/A 29C P0 61W / 400W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 7 NVIDIA A100-SXM4-80GB On | 00000000:C3:00.0 Off | 0 |
| N/A 32C P0 62W / 400W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+
dcgmi diag -r 3
Successfully ran diagnostic for group.
±--------------------------±-----------------------------------------------+
| Diagnostic | Result |
+===========================+================================================+
|----- Metadata ----------±-----------------------------------------------|
| DCGM Version | 3.1.7 |
| Driver Version Detected | 530.30.02 |
| GPU Device IDs Detected | 20b2,20b2,20b2,20b2,20b2,20b2,20b2,20b2 |
|----- Deployment --------±-----------------------------------------------|
| Denylist | Pass |
| NVML Library | Pass |
| CUDA Main Library | Pass |
| Permissions and OS Blocks | Pass |
| Persistence Mode | Pass |
| Environment Variables | Pass |
| Page Retirement/Row Remap | Pass |
| Graphics Processes | Pass |
| Inforom | Pass |
±---- Integration -------±-----------------------------------------------+
| PCIe | Fail - All |
| Warning | GPU 0 Error using CUDA API cudaDeviceGetByPCI |
| | BusId ‘initialization error’ for GPU 0, bus I |
| | D = 00000000:07:00.0 |
±---- Hardware ----------±-----------------------------------------------+
| GPU Memory | Fail - All |
| Warning | GPU 0 Error using CUDA API cuInit Unable to i |
| | nitialize CUDA library: 'initialization error |
| | ‘.; verify that the fabric-manager has been s |
| | tarted if applicable, GPU 0 Error using CUDA |
| | API cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fab |
| | ric-manager has been started if applicable, G |
| | PU 0 Error using CUDA API cuInit Unable to in |
| | itialize CUDA library: ‘initialization error’ |
| | .; verify that the fabric-manager has been st |
| | arted if applicable, GPU 0 Error using CUDA A |
| | PI cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fabr |
| | ic-manager has been started if applicable, GP |
| | U 0 Error using CUDA API cuInit Unable to ini |
| | tialize CUDA library: ‘initialization error’. |
| | ; verify that the fabric-manager has been sta |
| | rted if applicable, GPU 0 Error using CUDA AP |
| | I cuInit Unable to initialize CUDA library: ’ |
| | initialization error’.; verify that the fabri |
| | c-manager has been started if applicable, GPU |
| | 0 Error using CUDA API cuInit Unable to init |
| | ialize CUDA library: 'initializat |
| Warning | GPU 1 Error using CUDA API cuInit Unable to i |
| | nitialize CUDA library: 'initialization error |
| | ‘.; verify that the fabric-manager has been s |
| | tarted if applicable, GPU 1 Error using CUDA |
| | API cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fab |
| | ric-manager has been started if applicable, G |
| | PU 1 Error using CUDA API cuInit Unable to in |
| | itialize CUDA library: ‘initialization error’ |
| | .; verify that the fabric-manager has been st |
| | arted if applicable, GPU 1 Error using CUDA A |
| | PI cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fabr |
| | ic-manager has been started if applicable, GP |
| | U 1 Error using CUDA API cuInit Unable to ini |
| | tialize CUDA library: ‘initialization error’. |
| | ; verify that the fabric-manager has been sta |
| | rted if applicable, GPU 1 Error using CUDA AP |
| | I cuInit Unable to initialize CUDA library: ’ |
| | initialization error’.; verify that the fabri |
| | c-manager has been started if applicable, GPU |
| | 1 Error using CUDA API cuInit Unable to init |
| | ialize CUDA library: 'initializat |
| Warning | GPU 2 Error using CUDA API cuInit Unable to i |
| | nitialize CUDA library: 'initialization error |
| | ‘.; verify that the fabric-manager has been s |
| | tarted if applicable, GPU 2 Error using CUDA |
| | API cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fab |
| | ric-manager has been started if applicable, G |
| | PU 2 Error using CUDA API cuInit Unable to in |
| | itialize CUDA library: ‘initialization error’ |
| | .; verify that the fabric-manager has been st |
| | arted if applicable, GPU 2 Error using CUDA A |
| | PI cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fabr |
| | ic-manager has been started if applicable, GP |
| | U 2 Error using CUDA API cuInit Unable to ini |
| | tialize CUDA library: ‘initialization error’. |
| | ; verify that the fabric-manager has been sta |
| | rted if applicable, GPU 2 Error using CUDA AP |
| | I cuInit Unable to initialize CUDA library: ’ |
| | initialization error’.; verify that the fabri |
| | c-manager has been started if applicable, GPU |
| | 2 Error using CUDA API cuInit Unable to init |
| | ialize CUDA library: 'initializat |
| Warning | GPU 3 Error using CUDA API cuInit Unable to i |
| | nitialize CUDA library: 'initialization error |
| | ‘.; verify that the fabric-manager has been s |
| | tarted if applicable, GPU 3 Error using CUDA |
| | API cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fab |
| | ric-manager has been started if applicable, G |
| | PU 3 Error using CUDA API cuInit Unable to in |
| | itialize CUDA library: ‘initialization error’ |
| | .; verify that the fabric-manager has been st |
| | arted if applicable, GPU 3 Error using CUDA A |
| | PI cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fabr |
| | ic-manager has been started if applicable, GP |
| | U 3 Error using CUDA API cuInit Unable to ini |
| | tialize CUDA library: ‘initialization error’. |
| | ; verify that the fabric-manager has been sta |
| | rted if applicable, GPU 3 Error using CUDA AP |
| | I cuInit Unable to initialize CUDA library: ’ |
| | initialization error’.; verify that the fabri |
| | c-manager has been started if applicable, GPU |
| | 3 Error using CUDA API cuInit Unable to init |
| | ialize CUDA library: 'initializat |
| Warning | GPU 4 Error using CUDA API cuInit Unable to i |
| | nitialize CUDA library: 'initialization error |
| | ‘.; verify that the fabric-manager has been s |
| | tarted if applicable, GPU 4 Error using CUDA |
| | API cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fab |
| | ric-manager has been started if applicable, G |
| | PU 4 Error using CUDA API cuInit Unable to in |
| | itialize CUDA library: ‘initialization error’ |
| | .; verify that the fabric-manager has been st |
| | arted if applicable, GPU 4 Error using CUDA A |
| | PI cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fabr |
| | ic-manager has been started if applicable, GP |
| | U 4 Error using CUDA API cuInit Unable to ini |
| | tialize CUDA library: ‘initialization error’. |
| | ; verify that the fabric-manager has been sta |
| | rted if applicable, GPU 4 Error using CUDA AP |
| | I cuInit Unable to initialize CUDA library: ’ |
| | initialization error’.; verify that the fabri |
| | c-manager has been started if applicable, GPU |
| | 4 Error using CUDA API cuInit Unable to init |
| | ialize CUDA library: 'initializat |
| Warning | GPU 5 Error using CUDA API cuInit Unable to i |
| | nitialize CUDA library: 'initialization error |
| | ‘.; verify that the fabric-manager has been s |
| | tarted if applicable, GPU 5 Error using CUDA |
| | API cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fab |
| | ric-manager has been started if applicable, G |
| | PU 5 Error using CUDA API cuInit Unable to in |
| | itialize CUDA library: ‘initialization error’ |
| | .; verify that the fabric-manager has been st |
| | arted if applicable, GPU 5 Error using CUDA A |
| | PI cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fabr |
| | ic-manager has been started if applicable, GP |
| | U 5 Error using CUDA API cuInit Unable to ini |
| | tialize CUDA library: ‘initialization error’. |
| | ; verify that the fabric-manager has been sta |
| | rted if applicable, GPU 5 Error using CUDA AP |
| | I cuInit Unable to initialize CUDA library: ’ |
| | initialization error’.; verify that the fabri |
| | c-manager has been started if applicable, GPU |
| | 5 Error using CUDA API cuInit Unable to init |
| | ialize CUDA library: 'initializat |
| Warning | GPU 6 Error using CUDA API cuInit Unable to i |
| | nitialize CUDA library: 'initialization error |
| | ‘.; verify that the fabric-manager has been s |
| | tarted if applicable, GPU 6 Error using CUDA |
| | API cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fab |
| | ric-manager has been started if applicable, G |
| | PU 6 Error using CUDA API cuInit Unable to in |
| | itialize CUDA library: ‘initialization error’ |
| | .; verify that the fabric-manager has been st |
| | arted if applicable, GPU 6 Error using CUDA A |
| | PI cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fabr |
| | ic-manager has been started if applicable, GP |
| | U 6 Error using CUDA API cuInit Unable to ini |
| | tialize CUDA library: ‘initialization error’. |
| | ; verify that the fabric-manager has been sta |
| | rted if applicable, GPU 6 Error using CUDA AP |
| | I cuInit Unable to initialize CUDA library: ’ |
| | initialization error’.; verify that the fabri |
| | c-manager has been started if applicable, GPU |
| | 6 Error using CUDA API cuInit Unable to init |
| | ialize CUDA library: 'initializat |
| Warning | GPU 7 Error using CUDA API cuInit Unable to i |
| | nitialize CUDA library: 'initialization error |
| | ‘.; verify that the fabric-manager has been s |
| | tarted if applicable, GPU 7 Error using CUDA |
| | API cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fab |
| | ric-manager has been started if applicable, G |
| | PU 7 Error using CUDA API cuInit Unable to in |
| | itialize CUDA library: ‘initialization error’ |
| | .; verify that the fabric-manager has been st |
| | arted if applicable, GPU 7 Error using CUDA A |
| | PI cuInit Unable to initialize CUDA library: |
| | ‘initialization error’.; verify that the fabr |
| | ic-manager has been started if applicable, GP |
| | U 7 Error using CUDA API cuInit Unable to ini |
| | tialize CUDA library: ‘initialization error’. |
| | ; verify that the fabric-manager has been sta |
| | rted if applicable, GPU 7 Error using CUDA AP |
| | I cuInit Unable to initialize CUDA library: ’ |
| | initialization error’.; verify that the fabri |
| | c-manager has been started if applicable, GPU |
| | 7 Error using CUDA API cuInit Unable to init |
| | ialize CUDA library: 'initializat |
| Diagnostic | Fail - All |
| Warning | GPU 0 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 0: ‘initialization error’, GPU 0 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 1: ‘initialization error’, GPU 0 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 2: ‘initi |
| | alization error’, GPU 0 API call cudaDeviceGe |
| | tByPCIBusId failed for GPU 3: ‘initialization |
| | error’, GPU 0 API call cudaDeviceGetByPCIBus |
| | Id failed for GPU 4: ‘initialization error’, |
| | GPU 0 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 5: ‘initialization error’, GPU 0 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 6: ‘initialization error’, GPU 0 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 7: ‘initi |
| | alization error’, GPU 0 There was an internal |
| | error during the test: ‘Failed to initialize |
| | the plugin.’, GPU 0 Error using CUDA API cud |
| | aDeviceGetByPCIBusId ‘initialization error’ f |
| | or GPU 0, bus ID = 00000000:07:00.0 |
| Warning | GPU 1 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 0: ‘initialization error’, GPU 1 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 1: ‘initialization error’, GPU 1 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 2: ‘initi |
| | alization error’, GPU 1 API call cudaDeviceGe |
| | tByPCIBusId failed for GPU 3: ‘initialization |
| | error’, GPU 1 API call cudaDeviceGetByPCIBus |
| | Id failed for GPU 4: ‘initialization error’, |
| | GPU 1 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 5: ‘initialization error’, GPU 1 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 6: ‘initialization error’, GPU 1 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 7: ‘initi |
| | alization error’, GPU 1 There was an internal |
| | error during the test: ‘Failed to initialize |
| | the plugin.’ |
| Warning | GPU 2 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 0: ‘initialization error’, GPU 2 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 1: ‘initialization error’, GPU 2 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 2: ‘initi |
| | alization error’, GPU 2 API call cudaDeviceGe |
| | tByPCIBusId failed for GPU 3: ‘initialization |
| | error’, GPU 2 API call cudaDeviceGetByPCIBus |
| | Id failed for GPU 4: ‘initialization error’, |
| | GPU 2 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 5: ‘initialization error’, GPU 2 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 6: ‘initialization error’, GPU 2 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 7: ‘initi |
| | alization error’, GPU 2 There was an internal |
| | error during the test: ‘Failed to initialize |
| | the plugin.’ |
| Warning | GPU 3 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 0: ‘initialization error’, GPU 3 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 1: ‘initialization error’, GPU 3 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 2: ‘initi |
| | alization error’, GPU 3 API call cudaDeviceGe |
| | tByPCIBusId failed for GPU 3: ‘initialization |
| | error’, GPU 3 API call cudaDeviceGetByPCIBus |
| | Id failed for GPU 4: ‘initialization error’, |
| | GPU 3 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 5: ‘initialization error’, GPU 3 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 6: ‘initialization error’, GPU 3 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 7: ‘initi |
| | alization error’, GPU 3 There was an internal |
| | error during the test: ‘Failed to initialize |
| | the plugin.’ |
| Warning | GPU 4 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 0: ‘initialization error’, GPU 4 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 1: ‘initialization error’, GPU 4 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 2: ‘initi |
| | alization error’, GPU 4 API call cudaDeviceGe |
| | tByPCIBusId failed for GPU 3: ‘initialization |
| | error’, GPU 4 API call cudaDeviceGetByPCIBus |
| | Id failed for GPU 4: ‘initialization error’, |
| | GPU 4 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 5: ‘initialization error’, GPU 4 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 6: ‘initialization error’, GPU 4 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 7: ‘initi |
| | alization error’, GPU 4 There was an internal |
| | error during the test: ‘Failed to initialize |
| | the plugin.’, GPU 4 Clocks are being throttl |
| | ed for GPU 4 because of clock throttling star |
| | ting 8.2 seconds into the test. clocks_thrott |
| | le_reason_hw_slowdown: either the temperature |
| | is too high or there is a power supply probl |
| | em (the power brake assertion has been trippe |
| | d). |
| Warning | GPU 5 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 0: ‘initialization error’, GPU 5 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 1: ‘initialization error’, GPU 5 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 2: ‘initi |
| | alization error’, GPU 5 API call cudaDeviceGe |
| | tByPCIBusId failed for GPU 3: ‘initialization |
| | error’, GPU 5 API call cudaDeviceGetByPCIBus |
| | Id failed for GPU 4: ‘initialization error’, |
| | GPU 5 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 5: ‘initialization error’, GPU 5 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 6: ‘initialization error’, GPU 5 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 7: ‘initi |
| | alization error’, GPU 5 There was an internal |
| | error during the test: ‘Failed to initialize |
| | the plugin.’ |
| Warning | GPU 6 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 0: ‘initialization error’, GPU 6 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 1: ‘initialization error’, GPU 6 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 2: ‘initi |
| | alization error’, GPU 6 API call cudaDeviceGe |
| | tByPCIBusId failed for GPU 3: ‘initialization |
| | error’, GPU 6 API call cudaDeviceGetByPCIBus |
| | Id failed for GPU 4: ‘initialization error’, |
| | GPU 6 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 5: ‘initialization error’, GPU 6 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 6: ‘initialization error’, GPU 6 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 7: ‘initi |
| | alization error’, GPU 6 There was an internal |
| | error during the test: ‘Failed to initialize |
| | the plugin.’ |
| Warning | GPU 7 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 0: ‘initialization error’, GPU 7 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 1: ‘initialization error’, GPU 7 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 2: ‘initi |
| | alization error’, GPU 7 API call cudaDeviceGe |
| | tByPCIBusId failed for GPU 3: ‘initialization |
| | error’, GPU 7 API call cudaDeviceGetByPCIBus |
| | Id failed for GPU 4: ‘initialization error’, |
| | GPU 7 API call cudaDeviceGetByPCIBusId failed |
| | for GPU 5: ‘initialization error’, GPU 7 API |
| | call cudaDeviceGetByPCIBusId failed for GPU |
| | 6: ‘initialization error’, GPU 7 API call cud |
| | aDeviceGetByPCIBusId failed for GPU 7: ‘initi |
| | alization error’, GPU 7 There was an internal |
| | error during the test: ‘Failed to initialize |
| | the plugin.’ |
±---- Stress ------------±-----------------------------------------------+
| Memory Bandwidth | Fail - All |
| Warning | GPU 0 API call cuInit failed for GPU 0: ‘init |
| | ialization error; verify that the fabric-mana |
| | ger has been started if applicable’ |
| Warning | GPU 1 API call cuInit failed for GPU 0: ‘init |
| | ialization error; verify that the fabric-mana |
| | ger has been started if applicable’ |
| Warning | GPU 2 API call cuInit failed for GPU 0: ‘init |
| | ialization error; verify that the fabric-mana |
| | ger has been started if applicable’ |
| Warning | GPU 3 API call cuInit failed for GPU 0: ‘init |
| | ialization error; verify that the fabric-mana |
| | ger has been started if applicable’ |
| Warning | GPU 4 API call cuInit failed for GPU 0: ‘init |
| | ialization error; verify that the fabric-mana |
| | ger has been started if applicable’, GPU 4 Cl |
| | ocks are being throttled for GPU 4 because of |
| | clock throttling starting 8.2 seconds into t |
| | he test. clocks_throttle_reason_hw_slowdown: |
| | either the temperature is too high or there i |
| | s a power supply problem (the power brake ass |
| | ertion has been tripped). |
| Warning | GPU 5 API call cuInit failed for GPU 0: ‘init |
| | ialization error; verify that the fabric-mana |
| | ger has been started if applicable’ |
| Warning | GPU 6 API call cuInit failed for GPU 0: ‘init |
| | ialization error; verify that the fabric-mana |
| | ger has been started if applicable’ |
| Warning | GPU 7 API call cuInit failed for GPU 0: ‘init |
| | ialization error; verify that the fabric-mana |
| | ger has been started if applicable’ |
| EUD Test | Skip - All |
±--------------------------±-----------------------------------------------+
dcgmi discovery -l
8 GPUs found.
±-------±---------------------------------------------------------------------+
| GPU ID | Device Information |
±-------±---------------------------------------------------------------------+
| 0 | Name: NVIDIA A100-SXM4-80GB |
| | PCI Bus ID: 00000000:07:00.0 |
| | Device UUID: GPU-9c245d1a-2c6f-a7d6-b91e-e18f6ba6476e |
±-------±---------------------------------------------------------------------+
| 1 | Name: NVIDIA A100-SXM4-80GB |
| | PCI Bus ID: 00000000:0A:00.0 |
| | Device UUID: GPU-e33addb3-e24d-e616-cbd4-309f29023f5e |
±-------±---------------------------------------------------------------------+
| 2 | Name: NVIDIA A100-SXM4-80GB |
| | PCI Bus ID: 00000000:44:00.0 |
| | Device UUID: GPU-0eb39c07-6f34-99f2-d9b8-a45ff0d18205 |
±-------±---------------------------------------------------------------------+
| 3 | Name: NVIDIA A100-SXM4-80GB |
| | PCI Bus ID: 00000000:4A:00.0 |
| | Device UUID: GPU-96afb7d3-7126-4335-2142-dc31b3c6c300 |
±-------±---------------------------------------------------------------------+
| 4 | Name: NVIDIA A100-SXM4-80GB |
| | PCI Bus ID: 00000000:84:00.0 |
| | Device UUID: GPU-61f669d9-b2ca-6bb4-b89e-b705e7f697a9 |
±-------±---------------------------------------------------------------------+
| 5 | Name: NVIDIA A100-SXM4-80GB |
| | PCI Bus ID: 00000000:8A:00.0 |
| | Device UUID: GPU-9887433e-1b65-69bc-7cfa-ffa18a6a614b |
±-------±---------------------------------------------------------------------+
| 6 | Name: NVIDIA A100-SXM4-80GB |
| | PCI Bus ID: 00000000:C0:00.0 |
| | Device UUID: GPU-ff993969-54b7-7ebd-aaf3-648657faab95 |
±-------±---------------------------------------------------------------------+
| 7 | Name: NVIDIA A100-SXM4-80GB |
| | PCI Bus ID: 00000000:C3:00.0 |
| | Device UUID: GPU-c6f5c562-fd22-3ccd-333e-4d5f1e4d8828 |
±-------±---------------------------------------------------------------------+
0 NvSwitches found.
±----------+
| Switch ID |
±----------+
±----------+
dcgmi nvlink -s
±---------------------+
| NvLink Link Status |
±---------------------+
GPUs:
gpuId 0:
U U U U U U U U U U U U _ _ _ _ _ _
gpuId 1:
U U U U U U U U U U U U _ _ _ _ _ _
gpuId 2:
U U U U U U U U U U U U _ _ _ _ _ _
gpuId 3:
U U U U U U U U U U U U _ _ _ _ _ _
gpuId 4:
D D D D D D D D D D D D _ _ _ _ _ _
gpuId 5:
U U U U U U U U U U U U _ _ _ _ _ _
gpuId 6:
U U U U U U U U U U U U _ _ _ _ _ _
gpuId 7:
U U U U U U U U U U U U _ _ _ _ _ _
NvSwitches:
No NvSwitches found.
Key: Up=U, Down=D, Disabled=X, Not Supported=_