I am setting up Isaac SIM on EC2 server with 4 A10G Tensor Core GPUs. as per this instruction and getting following error:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
Attached details from nvidia-smi
Hi @asriaws,
I’m currently facing the same error and it seems that I need to run my container as superuser so it can access GPUs.
I’m still trying to find a solution so I won’t have to create such a privileged (and possible unsecure) environment. I’ll put my findings here as an answer if they are useful.
In the meantime, you’ve told us you’ve had followed Nvidia Omniverse’s instructions, so at the 2nd step of Container Setup (Install Docker), you should have execute the Post-install steps for Docker, which basically creates a group docker with root permissions, add your user to it and activate the changes (accordingly to Docker’s documentation).
So your issue might come from elsewhere but that might be a good startpoint to look at!