We have deployed a DCGM_Exporter to collect GPU metrics from GKE cluster and able to fetch metrics from normal GPUs. But in case of multi instance GPUs(enabled in A100 gpu type), there are no pod, namespace names are getting listed. Any one knows about it? is it a limitation in gke?
TYPE DCGM_FI_DEV_FB_USED gauge
DCGM_FI_DEV_FB_USED{gpu=“0”,UUID=“GPU-06e64551-f4d7-c43c-7d76-dbe3cefa1a23”,device=“nvidia0”,modelName=“A100-SXM4-40GB”,GPU_I_PROFILE=“1g.5gb”,GPU_I_ID=“7”,Hostname=“dcgm-exporter-qzq85”,container=“”,namespace=“”,pod=“”} 3 DCGM_FI_DEV_FB_USED{gpu=“0”,UUID=“GPU-06e64551-f4d7-c43c-7d76-dbe3cefa1a23”,device=“nvidia0”,modelName=“A100-SXM4-40GB”,GPU_I_PROFILE=“1g.5gb”,GPU_I_ID=“8”,Hostname=“dcgm-exporter-qzq85”,container=“”,namespace=“”,pod=“”} 3022