Docker unable to start when using MIG GPU devices

Hello, I am trying to start a container using a MIG device on the DGX H100 following these instructions.

I have created the MIG instances and get the following output from nvidia-smi outside of the container

I, then, run:

sudo docker run --runtime=nvidia --gpus '"device=0:0"' -it --rm nvcr.io/hpc/gromacs:2023.2

which results in the following error:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=0:0: unknown.

which has completely stumped me. I have tried also passing the MIG-<UUID> as specified in the above instructions but I get the same error. If I pass whole GPUs e.g. --gpus=0 then the container works as normal.

Thanks in advance for any help.

silly question but there are two architectures you can download there: arm64 and amd64. Did you get the amd64 one?

yes it is amd64, I double checked with docker inspect

It looks like you have CDI mode enabled for your toolkit, which means you will need to regenerate your CDI spec after creating the MIG device.

nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml