The GPU IDs that I can pass to nsys profile --gpu-metrics-device=...
are in a different order from those given by nvidia-smi
.
Why is that so? It is quite easy to make a mistake here. For instance in the following I may want to profile a process running with CUDA_VISIBLE_DEVICES=0
, e.g. using the GPU with bus ID 01:00.0
, but to sample the same GPU with nsys I need to pass an argument like --gpu-metrics-device=2
.
$ nsys profile --gpu-metrics-device=help
Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
Possible --gpu-metrics-device values are:
0: NVIDIA GeForce RTX 3090 PCI[0000:61:00.0]
1: NVIDIA GeForce RTX 3090 PCI[0000:41:00.0]
2: NVIDIA GeForce RTX 3090 PCI[0000:01:00.0]
3: NVIDIA GeForce RTX 3090 PCI[0000:25:00.0]
4: NVIDIA GeForce RTX 3090 PCI[0000:e1:00.0]
5: NVIDIA GeForce RTX 3090 PCI[0000:c1:00.0]
6: NVIDIA GeForce RTX 3090 PCI[0000:81:00.0]
7: NVIDIA GeForce RTX 3090 PCI[0000:a1:00.0]
all: Select all supported GPUs
none: Disable GPU Metrics [Default]
$ nvidia-smi
Thu Jun 23 15:07:28 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 30% 44C P2 110W / 350W | 23008MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:25:00.0 Off | N/A |
| 30% 30C P8 25W / 350W | 3MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 30% 28C P8 28W / 350W | 3MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... On | 00000000:61:00.0 Off | N/A |
| 30% 27C P8 28W / 350W | 3MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA GeForce ... On | 00000000:81:00.0 Off | N/A |
| 30% 26C P8 28W / 350W | 3MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA GeForce ... On | 00000000:A1:00.0 Off | N/A |
| 30% 27C P8 22W / 350W | 3MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA GeForce ... On | 00000000:C1:00.0 Off | N/A |
| 30% 26C P8 23W / 350W | 3MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA GeForce ... On | 00000000:E1:00.0 Off | N/A |
| 30% 26C P8 29W / 350W | 3MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+