Clara AGX Dev Kit won't boot after nvgpuswitch.py install dGPU failure

Hello,

I performed a fresh install/flash on my Clara AGX using SDK Manager 1.7.1.8928 with the new Clara Holoscan SDK. I ran the “nvgpuswitch.py install dGPU” command, but it failed to install and gave me a dependency error related to the nvidia-container-runtime. I attempted to run an apt update and apt upgrade, but during the process the internet connection failed on the AGX (not any of my other computers). I rebooted the AGX, but it will no longer boot.

I have tried to hold both reset and recovery buttons simultaneously for 20 seconds, but it does not reset the AGX. I have attempted to boot to recovery mode with the recovery button and the power button simultaneously but this does not work.

Do you have any suggestions for how I might be able to reset / re-flash the AGX?

It seems that dGPU mode is no longer compatible with the Clara Holoscan SDK image?

Thank you for your input.

After leaving the machine off and unplugged for ~10 minutes, I was able to boot to recovery mode by holding the recovery button while pressing the front power button. This then allowed me to Flash the system again in Manual Mode from the SDKManager. So that problem is solved.

1 Like

Great to hear that booting to recovery mode helped with reflashing.

For future references, if booting to recovery mode doesn’t help the host system to detect Clara AGX for reflashing, then we can try going into the reset mode (pressing reset button + recovery button) for the host system to detect Clara AGX for reflashing.

Thank you. I will remember that in the future.

Unfortunately, it seems that there is still a dependency error when converting from iGPU mode to dGPU mode with the nvgpuswitch.py script. This is with JetPack4.5.1. The dependency error is related to nvidia-container-runtime and nvidia-container-toolkit. I am trying different versions of these two packages, without luck so far.

Any recommendations would be appreciated. Thank you.

Could you let us know which error occurs and what is your nvidia-container-runtime/nvidia-container-toolkit versions? (The nvgpuswitch.py script should be taking care of that for us)

After a new flash, I run:

sudo /opt/nvidia/clara-holoscan-sdk/clara-holoscan-tools/bin/nvgpuswitch.py install dGPU

which gives the following error:

Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
nvidia-container-runtime : Depends: nvidia-container-toolkit (>= 1.4.2) but 1.0.1-1 is to be installed
E: Unable to correct problems, you have held broken packages.
ERROR: Install dGPU drivers failed!

Currently, I have:
nvidia-container-runtime=3.1.0-1
nvidia-container-toolkit=1.0.1-1

The nvgpuswitch.py script tries to install nvidia-container-runtime=3.4.2-1. I tried changing the script to use the most recent version 3.7.0-1, but this seems to cause downstream problems when reinstalling the holoscan SDK.

Thank you for your help.

Thanks for the detailed message. Could you try to install nvidia-container-toolkit version 1.5.1-1 (sudo apt-get install nvidia-container-toolkit=1.5.1-1)?

If I update nvidia-container-toolkit to version 1.5.1-1, then nvgpuswitch.py does run without halting errors. I can then reboot, and the video output comes from the discrete GPU as expected. nvidia-smi shows the RTX 6000.

However, after doing this DeepStream is uninstalled. There is no deepstream-app present. Also, the clara-holoscan-deepstream-sample folder has been removed. I tried to reinstall the SDK and DeepStream via the SDKManager without re-flashing, but this also yields errors. I tried to copy the deepstream deb package file over to the Clara and install manually, but again encounter package dependency errors that I can’t seem to resolve.

Please see section 7.1 Reinstalling Clara Holoscan SDK Packages in the Documentation https://developer.nvidia.com/Clara-Holoscan-SDK-Documentation for reinstalling packages once in dGPU mode. For a complete Checklist for Setting up the Developer Kit, please see the User Guide at https://developer.nvidia.com/clara-agx-development-kit-user-guide. You can find more documentation for the Holoscan SDK on the developer page NVIDIA Clara Holoscan SDK | NVIDIA Developer.

Thank you. That did the trick. Ultimately, 1) flashing, 2) updating nvidia-container-toolkit to version 1.5.1-1, 3) running nvgpuswitch.py, then 4) reinstalling libvisionworks, libvisionworks-dev, deepstream, and clara holoscan manually did the trick. The Endoscopy Example runs now with discrete GPU.
Thank you for your help!