I performed a fresh install/flash on my Clara AGX using SDK Manager 1.7.1.8928 with the new Clara Holoscan SDK. I ran the “nvgpuswitch.py install dGPU” command, but it failed to install and gave me a dependency error related to the nvidia-container-runtime. I attempted to run an apt update and apt upgrade, but during the process the internet connection failed on the AGX (not any of my other computers). I rebooted the AGX, but it will no longer boot.
I have tried to hold both reset and recovery buttons simultaneously for 20 seconds, but it does not reset the AGX. I have attempted to boot to recovery mode with the recovery button and the power button simultaneously but this does not work.
Do you have any suggestions for how I might be able to reset / re-flash the AGX?
It seems that dGPU mode is no longer compatible with the Clara Holoscan SDK image?
After leaving the machine off and unplugged for ~10 minutes, I was able to boot to recovery mode by holding the recovery button while pressing the front power button. This then allowed me to Flash the system again in Manual Mode from the SDKManager. So that problem is solved.
Great to hear that booting to recovery mode helped with reflashing.
For future references, if booting to recovery mode doesn’t help the host system to detect Clara AGX for reflashing, then we can try going into the reset mode (pressing reset button + recovery button) for the host system to detect Clara AGX for reflashing.
Unfortunately, it seems that there is still a dependency error when converting from iGPU mode to dGPU mode with the nvgpuswitch.py script. This is with JetPack4.5.1. The dependency error is related to nvidia-container-runtime and nvidia-container-toolkit. I am trying different versions of these two packages, without luck so far.
Any recommendations would be appreciated. Thank you.
Could you let us know which error occurs and what is your nvidia-container-runtime/nvidia-container-toolkit versions? (The nvgpuswitch.py script should be taking care of that for us)
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
nvidia-container-runtime : Depends: nvidia-container-toolkit (>= 1.4.2) but 1.0.1-1 is to be installed
E: Unable to correct problems, you have held broken packages.
ERROR: Install dGPU drivers failed!
Currently, I have:
nvidia-container-runtime=3.1.0-1
nvidia-container-toolkit=1.0.1-1
The nvgpuswitch.py script tries to install nvidia-container-runtime=3.4.2-1. I tried changing the script to use the most recent version 3.7.0-1, but this seems to cause downstream problems when reinstalling the holoscan SDK.
Thanks for the detailed message. Could you try to install nvidia-container-toolkit version 1.5.1-1 (sudo apt-get install nvidia-container-toolkit=1.5.1-1)?
If I update nvidia-container-toolkit to version 1.5.1-1, then nvgpuswitch.py does run without halting errors. I can then reboot, and the video output comes from the discrete GPU as expected. nvidia-smi shows the RTX 6000.
However, after doing this DeepStream is uninstalled. There is no deepstream-app present. Also, the clara-holoscan-deepstream-sample folder has been removed. I tried to reinstall the SDK and DeepStream via the SDKManager without re-flashing, but this also yields errors. I tried to copy the deepstream deb package file over to the Clara and install manually, but again encounter package dependency errors that I can’t seem to resolve.
Thank you. That did the trick. Ultimately, 1) flashing, 2) updating nvidia-container-toolkit to version 1.5.1-1, 3) running nvgpuswitch.py, then 4) reinstalling libvisionworks, libvisionworks-dev, deepstream, and clara holoscan manually did the trick. The Endoscopy Example runs now with discrete GPU.
Thank you for your help!