Hello, I am trying to start a container using a MIG device on the DGX H100 following these instructions.
I have created the MIG instances and get the following output from nvidia-smi
outside of the container
I, then, run:
sudo docker run --runtime=nvidia --gpus '"device=0:0"' -it --rm nvcr.io/hpc/gromacs:2023.2
which results in the following error:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=0:0: unknown.
which has completely stumped me. I have tried also passing the MIG-<UUID>
as specified in the above instructions but I get the same error. If I pass whole GPUs e.g. --gpus=0
then the container works as normal.
Thanks in advance for any help.