How is GPU utilisation calculated (as returned by nvmlDeviceGetUtilizationRates(...))?

I would like to calculate GPU utilisation per process rather than per device, I would therefore like to know how the GPU utilisation field is calculated? The number from:

#include <nvml.h>
Data Fields
• unsigned int gpu    <--- THIS FIELD
Percent of time over the past second during which one or more kernels was executing on the GPU.
• unsigned int memory
Percent of time over the past second during which global (device) memory was being read or written.

I have tried implementing the above per process using the CUPTI API, adding up all the nanoseconds each task for a process takes over a window of one second (as specified above), however the result is always far lower than that from the above.

So my question is how does the NVIDIA SDK/Driver calculate this metric specifically, and is there a better approach than my current one for doing this? Thanks!