Dcgm-exporter in gke doesnot gives pods,namespace,container names in metrics of mig

iamprofessor98 · August 15, 2022, 9:35am

We have deployed a DCGM_Exporter to collect GPU metrics from GKE cluster and able to fetch metrics from normal GPUs. But in case of multi instance GPUs(enabled in A100 gpu type), there are no pod, namespace names are getting listed. Any one knows about it? is it a limitation in gke?

TYPE DCGM_FI_DEV_FB_USED gauge

DCGM_FI_DEV_FB_USED{gpu=“0”,UUID=“GPU-06e64551-f4d7-c43c-7d76-dbe3cefa1a23”,device=“nvidia0”,modelName=“A100-SXM4-40GB”,GPU_I_PROFILE=“1g.5gb”,GPU_I_ID=“7”,Hostname=“dcgm-exporter-qzq85”,container=“”,namespace=“”,pod=“”} 3 DCGM_FI_DEV_FB_USED{gpu=“0”,UUID=“GPU-06e64551-f4d7-c43c-7d76-dbe3cefa1a23”,device=“nvidia0”,modelName=“A100-SXM4-40GB”,GPU_I_PROFILE=“1g.5gb”,GPU_I_ID=“8”,Hostname=“dcgm-exporter-qzq85”,container=“”,namespace=“”,pod=“”} 3022

Topic		Replies	Views
Monitoring GPUs in Kubernetes with DCGM Technical Blog	8	1652	May 24, 2024
DCGM exporter 3.6.0 - can not gather metrics from the GA100 GPU (A100 80GB) Monitoring/Assessment Tools a100 , esxi	0	260	November 28, 2024
DCGM exporter does not export mps process id's where as it shows in nvidia-smi Docker and NVIDIA Docker	0	64	April 9, 2025
DCGM does not export profile metrics after some period of time Miscellaneous Products (archived)	0	2597	June 1, 2021
Issue with GPU Metrics Collection for NVIDIA A100 on Nsight Systems Profiling Linux Targets profiling	12	1061	June 5, 2024
Does jetson Xavier NX device support DCGM-exporter？ I get error using it Jetson AGX Xavier	2	586	May 18, 2022
DCGM Not reporting running processes Other Tools	1	548	April 25, 2019
[nsys profile] gpu-metrics-devices fails with "Already under profiling" Profiling Linux Targets profiling	12	337	June 2, 2025
Unable to retrieve running processes in DCGM System Management and Monitoring (NVML)	0	899	April 25, 2019
Can't find GPU in Kubernets on Jetson Nano cluster Jetson Nano nvbugs , neural-network-framework	27	4145	October 18, 2021

Dcgm-exporter in gke doesnot gives pods,namespace,container names in metrics of mig

TYPE DCGM_FI_DEV_FB_USED gauge

Related topics