I am running an on-premise Kubernetes cluster with multiple types of GPUs.
[root@gpu-feature-discovery-sjzg4 /]# gpu-feature-discovery mixed
I1022 16:36:51.576911 137 main.go:122] Starting OS watcher.
I1022 16:36:51.577238 137 main.go:127] Loading configuration.
I1022 16:36:51.577541 137 main.go:139]
Running with config:
{
“version”: “v1”,
“flags”: {
“migStrategy”: “none”,
“failOnInitError”: true,
“gdsEnabled”: null,
“mofedEnabled”: null,
“gfd”: {
“oneshot”: false,
“noTimestamp”: false,
“sleepInterval”: “1m0s”,
“outputFile”: “/etc/kubernetes/node-feature-discovery/features.d/gfd”,
“machineTypeFile”: “/sys/class/dmi/id/product_name”
}
},
“resources”: {
“gpus”: null
},
“sharing”: {
“timeSlicing”: {}
}
}
I1022 16:36:51.577912 137 factory.go:48] Detected NVML platform: found NVML library
I1022 16:36:51.577942 137 factory.go:48] Detected non-Tegra platform: /sys/devices/soc0/family file not found
I1022 16:36:51.577952 137 factory.go:64] Using NVML manager
I1022 16:36:51.577959 137 main.go:144] Start running
W1022 16:36:51.602083 137 mig-strategy.go:151] Multiple device types detected: [NVIDIA GeForce RTX 3080 NVIDIA GeForce RTX 3090 NVIDIA GeForce RTX 4090]
I1022 16:36:51.606246 137 main.go:187] Creating Labels
2023/10/22 16:36:51 Writing labels to output file /etc/kubernetes/node-feature-discovery/features.d/gfd
I1022 16:36:51.606418 137 main.go:197] Sleeping for 60000000000
Only the last GPU is showing up, the 4090.
root@kubernetes0: more /etc/kubernetes/node-feature-discovery/features.d/gfd
nvidia.com/gpu.compute.major=8
nvidia.com/gpu.count=1
nvidia.com/gpu.family=ampere
nvidia.com/gpu.machine=Standard-PC-(i440FX-+-PIIX,-1996)
nvidia.com/cuda.driver.minor=113
nvidia.com/gfd.timestamp=1697987627
nvidia.com/gpu.replicas=1
nvidia.com/gpu.memory=24564
nvidia.com/cuda.runtime.minor=2
nvidia.com/cuda.driver.rev=01
nvidia.com/mig.capable=false
nvidia.com/gpu.product=NVIDIA-GeForce-RTX-4090
nvidia.com/gpu.compute.minor=9
nvidia.com/cuda.driver.major=535
nvidia.com/cuda.runtime.major=12
How do I get all 3 GPUs to be discovered?