How to check volatile GPU utilization with CUDA-C code?

I am running my program on a public server with Ubuntu os, and I’d like to automatically check the volatile GPU utilization to select among multiple devices. I know I can do this with nvidia-smi, but how can I do this with CUDA-C code? And how to automatically select the device with the lowest volatile GPU utilization? Thank you!

nvidia-smi is simply a user interface built on top of the NVIDIA Management Library (NVML) which provides the actual core functionality:

You can use this library from your own applications.