How to know that how many threads were located on a device at a certain moment or during a period?

Hi, I want to use the NVML to check the utilization rates of my GPUs. And I find the function of “nvmlDeviceGetUtilizationRates” could do this. But it could only check the percent of time over the past sample period during which one or more kernels was executing on the GPU. It means that I could just know how often the GPU is used, but I can’t know how much the computing resource (e.g. how many threads are allocated on the GPU)was used at a certain moment or during a period.
How could I know about this?
Thanks!