GPU Driver Issue using AWS

Hi, I followed all the instructions here for the AWS setup of NVIDIA-Isaac requirements [Native Workstation Deployment — Omniverse Robotics documentation]
We have used the AMI: IsaacSim-Ubuntu-18.04-GPU-2021-05-25 on the EC2 instance.
After setting up and connecting to the AWS successfully through SSH, I followed this for running a headless container on the cloud
(Native Workstation Deployment — Omniverse Robotics documentation)
When I ran the ‘docker run’ command in Step 3, however, I repeatedly get this error “docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
I see that many people online get the same or similar error. Some suggested changing the version of docker. I tried that but the error message still popped up.
The nvidia-smi command doesn’t work. The error I get is: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. I installed and updated the nvidia-container-toolkit as well as the nvidia-docker packages, however, this error is persistent, suggesting that the whole issue has something to do with the GPU. The Linux image that we use should already be installing the GPU drivers right?
All the specifications are exactly according to the Isaac sim AWS requirements tutorial. What is it that this error means? What can I do about it?

Hi, are you using the AMI with this name: “IsaacSim-Ubuntu-18.04-GPU-2021-05-25”? It should be running docker version 20.10.6.
nvidia-smi should not fail for Isaac Sim to work.

Hi, I was getting the same error with Docker version 20.10.6, so I downgraded to 19 version which still satisfied the requirements. And yes, I’m using the AMI: IsaacSim-Ubuntu-18.04-GPU-2021-05-25 on the EC2 instance.

Can you try creating a new instance and see if nvidia-smi works. I think the kernel in the current instance maybe have been changed causing some driver issues and nvidia-smi to not work. That current instance may need an driver re-install.

Okay I’ll try creating a new instance and keep you updated! Thanks!

1 Like

Hi, I loaded up a new instance, did the nucleus installation, added sample assets. Using kit remote I’m able to open up the UI too. However I still get errors in my console like ‘failed to startup carb.windowing-glfw.plugin’ or ‘GLFW initialisation failed’. Is this still a graphics related issue? What should i do? Another issue is when I launch the kit remote client, I get several errors. I’ve attached 2 screenshots here. One is the screenshot of the console of Isaac-sim kit remote with the GLFW errors. The second screenshot is my terminal when I launch the kit-remote.



Thanks for your help!

Hi, the GLFW errors is known issue. Is should be fine if everything works. Do you have any issues with other samples?

The errors in the Kit Remote log can be ignored too. The NatHolePunch error could be caused by the internet connection or firewall. Seems like the connection still works with that error.

Hi,
Thanks for the reply. I don’t have issues with any sample in particular. I just wanted to sort these errors out because I have a much more complicated use of Isaac sim to go ahead with now. I didn’t want any of this to interfere later on.