Hi, have a simple question on util.memory (reported by nvmlDeviceGetUtilizationRates) and nvdia-smi. I am seeing the one reported by NVML to be 40% where as nvidia-smi shows 6715MiB / 7982MiB . Why are these 2 numbers different ? Thanks.
Anyone able to answer this ?
UTILIZATION is not ALLOCATION
try: “nvidia-smi dmon” or “nvidia-smi -q”
nvmlDeviceGetUtilizationRates() → nvmlUtilization_t →
memory: Percent of time over the past sample period during which global (device) memory was being read or written.
([url]https://docs.nvidia.com/deploy/nvml-api/structnvmlUtilization__t.html#structnvmlUtilization__t[/url])
nvmlDeviceGetMemoryInfo() → nvmlMemory_t →
total: Total installed FB memory (in bytes).
used: Allocated FB memory (in bytes). Note that the driver/GPU always sets aside a small amount of memory for bookkeeping.
([url]https://docs.nvidia.com/deploy/nvml-api/structnvmlMemory__t.html#structnvmlMemory__t[/url])
Thanks for the clarification. Utilization essentially refers to % of memory bandwidth used. Usage essentially refers to how much memory has been allocated/reserved.
That makes sense for the situation where OOM happens even when utilization is low ( 40%) , but reservation/allocation is high .
Is there any simple way to trade off the two ( e.g utilization vs allocation ) ?