NVML always reports PState 0, incorrect process utilization, and incorrect max clocks

NVML API for both Linux and Windows always reports PState 0 for GPU-0 and PState 8 for secondary GPUs:

The API values shown in my application are the same as those returned by nvidia-smi.

Unless an application puts an active load on the GPU, GPU utilization of a GPU accelerated application will become stale and not in-line with what the GPU utilization struct returns:

NVML’s function for getting the max GPU/Memory clock also does not factor in GPU overclocks under Linux. In other words, a max memory clock at P0 will remain at 5005 even though a 500Mhz overclock applied via nvctrl was applied.

Any solutions to any of these problems?