How to know that how many threads were located on a certain GPU device during a period?

Hi, I want to use the NVML to check the utilization rates of my GPUs. And I find the function of “nvmlDeviceGetUtilizationRates” could do this. But it could only check the percent of time over the past sample period during which one or more kernels was executing on the GPU. It means that I could just know how often the GPU is used, but I can’t know how much the computing resource (e.g. how many threads are allocated on the GPU)is used at a certain moment.
How could I know about this?

Hello makailove123,

We have an return called “nvmlDeviceGetComputeMode” that should do the needed. Also you can check your Device Enums in regards to device compute mode. Below is some information in regards to it. Also I have added a link to our Developer Zone as well as attached a current version of the NVML API Reference Guide (Dated March 2014).

Nvidia Developer Zone:

nvmlReturn_t nvmlDeviceGetComputeMode

(nvmlDevice_t device, nvmlComputeMode_t *mode)


device: The identifier of the target device mode. Reference in which to return the current compute mode
driverModel: The target driver model
flags: Flags that change the default behavior

  • NVML_SUCCESS if mode has been set
  • NVML_ERROR_UNINITIALIZED if the library has not been successfully initialized
  • NVML_ERROR_INVALID_ARGUMENT if device is invalid or mode is NULL
  • NVML_ERROR_NOT_SUPPORTED if the device does not support this feature
  • NVML_ERROR_GPU_IS_LOST if the target GPU has fallen off the bus or is otherwise inaccessible
  • NVML_ERROR_UNKNOWN on any unexpected error

Description: Retrieves the current compute mode for the device. For all CUDA-capable products.

enum nvmlComputeMode_t

Compute mode.
NVML_COMPUTEMODE_EXCLUSIVE_PROCESS was added in CUDA 4.0. Earlier CUDA versions supported a single exclusive mode, which is equivalent to NVML_COMPUTEMODE_EXCLUSIVE_THREAD in CUDA 4.0 and beyond.


  • Default compute mode -- multiple contexts per device.
  • Compute-exclusive-thread mode -- only one context per device, usable from one thread at a time.
  • Compute-prohibited mode -- no contexts per device.
  • Compute-exclusive-process mode -- only one context per device, usable from multiple threads at a time.

Thanks for your reply!
But I could not understand clearly how I could use nvmlDeviceGetComputeMode to get the count of threads allocated on a device at a certain moment or during a period.
It seems that the return of this function is a value of enum type, and it may only tell me which mode the device is working at but not the count of threads on it.
How could I use this function correctly?
Additionally, what’s the meaning of the member of NVML_COMPUTEMODE_COUNT?


The meaning of member NVML_COMPUTEMODE_COUNT is the last entry in the enumeration nvmlComputeMode. It shows how many modes are listed. In regards to the count of allocated threads on a device at a particular point in time I am not finding a way to do this with nvidia-smmi. I will put in a feature request if one does not yet exist. My notes were primarily to provide a way to see if you are using all of them. If it is of use one can use CUDA-Z in order to provide feedback on the “Threads per Block, Threads per Multiproc, and Thread Dimensions.”