I’m running the gpu operator in Openshift 4.14 and I have a gpu node with 2 A100 gpus. My strategy is set to “single” and I’m currently running with the label
nvidia.com/mig.config: all-1g.5gb
My goal is to have one gpu set to 1g.5gb for 7 partitions and have the other gpu set to a single large partition of 7g.40gb.
I tried setting the following labels:
nvidia.com/mig.config: all-enabled
nvidia.com/mig-7g.40gb.count: '1'
nvidia.com/mig-1g.5gb.count: '7'
and even though the mig manager will take this config and say it's "successful" it doesn't partition the gpus correctly.
The labels show on the node but it's status is:
status:
capacity:
cpu: '256'
ephemeral-storage: 468250412Ki
hugepages-1Gi: '0'
hugepages-2Mi: '0'
memory: 527902464Ki
nvidia.com/gpu: '14'
nvidia.com/mig-1g.5gb: '0'
pods: '250'
allocatable:
cpu: 255500m
ephemeral-storage: '430465837161'
hugepages-1Gi: '0'
hugepages-2Mi: '0'
memory: 526751488Ki
nvidia.com/gpu: '14'
nvidia.com/mig-1g.5gb: '0'
pods: '250'
I’d appreciate any guidance I could get with this issue.
Thanks!