I don’t think that DCGM is correctly reporting stats, the Max GPU Memory used is always 0 see below.
dcgmi stats --jstart demojob -g 486
dcgmi stats --jstop demojob -g 486
dcgmi stats --job demojob -g 486
Successfully retrieved statistics for job: demojob.
±-----------------------------------------------------------------------------+
| Summary |
+====================================+=========================================+
|----- Execution Stats ------------±----------------------------------------|
| Start Time | Mon Jan 20 17:04:19 2020 |
| End Time | Mon Jan 20 17:04:37 2020 |
| Total Execution Time (sec) | 17.6 |
| No. of Processes | 0 |
±---- Performance Stats ----------±----------------------------------------+
| Energy Consumed (Joules) | 379 |
| Power Usage (Watts) | Avg: 23.7039, Max: N/A, Min: N/A |
| Max GPU Memory Used (bytes) | 0 |
| Clocks and PCIe Performance | Available per GPU in verbose mode |
±---- Event Stats ----------------±----------------------------------------+
| Single Bit ECC Errors | Not Specified |
| Double Bit ECC Errors | Not Specified |
| PCIe Replay Warnings | Not Specified |
| Critical XID Errors | 0 |
±---- Slowdown Stats -------------±----------------------------------------+
| Due to - Power (%) | 0 |
| - Thermal (%) | 0 |
| - Reliability (%) | Not Supported |
| - Board Limit (%) | Not Supported |
| - Low Utilization (%) | Not Supported |
| - Sync Boost (%) | 0 |
±---- Overall Health -------------±----------------------------------------+
| Overall Health | Healthy |
±-----------------------------------±----------------------------------------+
The Card is a Titan V
nvidia-smi
Mon Jan 20 17:06:54 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.46 Driver Version: 390.46 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN V Off | 00000000:65:00.0 Off | N/A |
| 28% 26C P8 23W / 250W | 0MiB / 12064MiB | 0% Default |
±------------------------------±---------------------±---------------------+
I’m not sure this is the best method to test, but if anyone has other methods to prove DCGM is correctly reporting then please let me know. Thanks