Dynamically find the next available GPU during run-time?

FangQ · February 19, 2021, 4:42pm

I am wondering if there is a way for me to loop over all available GPU cards on a multi-GPU server and find the first device that is not being used (or not running a particular application?) I think this is possible by running nvidia-smi in the command line and parse the output, but I am wondering if I can do this using cuda APIs.

njuffa · February 20, 2021, 10:41pm

You can get this information from the same source nvidia-smi gets it, the NVIDIA Managment Library NVML: https://developer.nvidia.com/nvidia-management-library-nvml

Robert_Crovella · February 20, 2021, 10:48pm

There are no parts of the CUDA driver API or CUDA runtime API that provide this information (whether another process is currently “using” this GPU) directly. You could probably come up with some kind of inferential scheme e.g. based on cudaMemGetInfo and some a-priori knowledge about what this API returns in your specific case for the loaded and unloaded situations.

This type of question comes up from time to time. Do as you wish, of course, but this is what job schedulers (e.g. Slurm) are designed to help with. You could also set exclusive process compute mode on your GPUs if they support that setting, and then attempts to use an in-use GPU would result in API failures. Still not ideal, but no race conditions and the inferential part doesn’t require a-priori knowledge.

Using a polling based approach (as opposed to a reservation system) based on an API such as what you can get from NVML has obvious race condition possibilities.

Topic		Replies	Views
Query if GPU is occupied CUDA Programming and Performance	1	904	January 22, 2013
How to check if an Application is running on GPU CUDA Programming and Performance	1	2301	August 9, 2019
query which devices are in use? CUDA Programming and Performance	1	2559	August 3, 2010
Cuda GPU functions CUDA Programming and Performance	1	393	July 18, 2019
detecting if a gpu is driving a display CUDA Programming and Performance	6	2454	February 24, 2009
Finding Idle GPU in Multi-GPU System CUDA Programming and Performance	2	2340	December 21, 2007
How to check volatile GPU utilization with CUDA-C code? CUDA Programming and Performance	1	637	December 25, 2018
How can I tell whether an EXCLUSIVE_PROCESS-mode GPU is "taken" or not? CUDA Programming and Performance cuda , nvidia-smi , nvml	7	1600	November 22, 2023
find out the least busy GPU for nvenc CUDA Programming and Performance	2	778	October 7, 2016
GPU real-time monitoring in Windows using CUDA CUDA Programming and Performance	2	2797	June 26, 2014

Dynamically find the next available GPU during run-time?

Related topics