How can I get the GPU processor usage using CUDA API. I want to get the processor usage of each GPU connected in a cluster and to assign the job to the GPU having least processor usage.
Please help.
How can I get the GPU processor usage using CUDA API. I want to get the processor usage of each GPU connected in a cluster and to assign the job to the GPU having least processor usage.
Please help.
The nvidia-smi tool might help (NVIDIA System Management Interface | NVIDIA Developer). There’s a command line interface as well as an API called NVML (https://developer.nvidia.com/nvidia-management-library-nvml)
With Tesla hardware it should give you a utilization percentage. I only have access to consumer-level GTX cards, and with those it just give you temperature, fan speed, and memory utilization. However, even that would be useful because, unless you have an amazing cooling system, the fan speed and temperature are good approximations for the processor utilization.
For exmaple, see the two outputs of “nvidia-smi”, the command line tool. The first is while I had a job running on both GTX 680’s in this machine. Notice the fans at 70-72% and the temps at 87-88C. The second is one minute after killing the job, when the GPUs were idling. The fans quickly dropped to 36-37% and the temperatures to 59-62C.
Busy:
+------------------------------------------------------+
| NVIDIA-SMI 4.310.14 Driver Version: 310.14 |
|-------------------------------+----------------------+----------------------+
| GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 | 0000:01:00.0 N/A | N/A |
| 72% 88C N/A N/A / N/A | 28% 570MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 680 | 0000:02:00.0 N/A | N/A |
| 70% 87C N/A N/A / N/A | 3% 52MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+
Idle:
+------------------------------------------------------+
| NVIDIA-SMI 4.310.14 Driver Version: 310.14 |
|-------------------------------+----------------------+----------------------+
| GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 | 0000:01:00.0 N/A | N/A |
| 37% 62C N/A N/A / N/A | 26% 526MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 680 | 0000:02:00.0 N/A | N/A |
| 36% 59C N/A N/A / N/A | 0% 7MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+