I am wondering if there is a way for me to loop over all available GPU cards on a multi-GPU server and find the first device that is not being used (or not running a particular application?) I think this is possible by running nvidia-smi
in the command line and parse the output, but I am wondering if I can do this using cuda APIs.
You can get this information from the same source nvidia-smi
gets it, the NVIDIA Managment Library NVML
: https://developer.nvidia.com/nvidia-management-library-nvml
There are no parts of the CUDA driver API or CUDA runtime API that provide this information (whether another process is currently “using” this GPU) directly. You could probably come up with some kind of inferential scheme e.g. based on cudaMemGetInfo and some a-priori knowledge about what this API returns in your specific case for the loaded and unloaded situations.
This type of question comes up from time to time. Do as you wish, of course, but this is what job schedulers (e.g. Slurm) are designed to help with. You could also set exclusive process compute mode on your GPUs if they support that setting, and then attempts to use an in-use GPU would result in API failures. Still not ideal, but no race conditions and the inferential part doesn’t require a-priori knowledge.
Using a polling based approach (as opposed to a reservation system) based on an API such as what you can get from NVML has obvious race condition possibilities.