Hi - I’m trying to use Datacenter GPU Manager (DCGM) to gather info about running processes on a GPU. My code is fairly
straightforward:
import dcgm_fields
from DcgmReader import DcgmReader
def get_gpu_info():
myFieldIds = [ dcgm_fields.DCGM_FI_DEV_NAME,
dcgm_fields.DCGM_FI_DEV_UUID,
dcgm_fields.DCGM_FI_DEV_FB_TOTAL,
dcgm_fields.DCGM_FI_DEV_COMPUTE_PIDS
]
dr = DcgmReader(fieldIds=myFieldIds)
dr_gpu_data = dr.GetLatestGpuValuesAsFieldIdDict()
gpu_data = {}
for gpu, gpu_info in dr_gpu_data.items():
print(gpu_info)
gpu_data[gpu] = { 'model': gpu_info[50], 'gpu_id': gpu_info[54], 'gpu_compute_pids': gpu_info[221] }
return gpu_data
From dcgm_fields.py:
DCGM_FI_DEV_COMPUTE_PIDS = 221 #Compute processes running on the GPU.
When I run nvidia-smi I see the tensorflow processes bound to each of my GPUs and they all have “C” compute capability. However when I run the above function all the values are returned as None.
>>> get_gpu_info()
{50: 'Tesla V100-SXM2-32GB', 250: 32480, 54: 'GPU-c97bfcc0-f899-101c-ef1d-xxxxxxxx', 221: None}