Hey everyone,
I have an A100-SXM4-40GB connected to my computer through an sxm4-pcie adapter along with a couple of other gpus. However, while it is showing in nvidia-smi, it is not accessible within CUDA. There is also an error when displaying the power usage:
nvidia-smi
Wed Feb 5 23:53:54 2025
±----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … On | 00000000:0B:00.0 Off | N/A |
| 0% 33C P8 20W / 380W | 1MiB / 10240MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … On | 00000000:41:00.0 Off | N/A |
| 31% 25C P8 16W / 170W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 NVIDIA GeForce … On | 00000000:42:00.0 Off | N/A |
| 0% 43C P8 20W / 170W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 3 NVIDIA A100-SXM… On | 00000000:43:00.0 Off | 0 |
| N/A 23C P0 ERR! / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 4 NVIDIA GeForce … On | 00000000:44:00.0 Off | N/A |
| 34% 28C P8 15W / 170W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 5 NVIDIA GeForce … On | 00000000:45:00.0 Off | N/A |
| 0% 33C P8 21W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 6 NVIDIA GeForce … On | 00000000:46:00.0 Off | N/A |
| 30% 26C P8 26W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
Attached is the result of nvidia-bug-report.sh
nvidia-bug-report.log.gz (4.5 MB)