Unable to access containers after upgrading to Jetpack 5.1.2 on Orin AGX

Hi @dusty_nv, @DaneLLL and anyone

I finally got around to upgrade to Jetpack 51.2 (which we talked about some time ago on “Nvidia-util-515 and nvidia-utils-525 error” ) .

I followed the instructions on 1.3.2
How to Install JetPack :: NVIDIA JetPack Documentation exactly and no error after. I rebooted and everything looks normal. But then I tried to access my working containers and I get:

docker start 06ecd0ca2743 (or same if using the docker’s name)

Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown
Error: failed to start containers: 06ecd0ca2743

how can I solve this?

Some output that might be relevant:

nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX Open Kernel Module for aarch64 35.2.1 Release Build (buildbrain@mobile-u64-6348-d7000) Tue Jan 24 15:26:51 PST 2023

cat /sys/module/nvidia/version
35.2.1

sudo dmesg | grep NVRM
[ 15.219716] NVRM: loading NVIDIA UNIX Open Kernel Module for aarch64 35.2.1 Release Build (buildbrain@mobile-u64-6348-d7000) Tue Jan 24 15:26:51 PST 2023
[ 23.274017] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[ 23.275197] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[ 23.279964] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[ 23.360821] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x731341 result 0xffff:
[ 23.361384] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:
[ 25.747613] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x731341 result 0xffff:
[ 26.047019] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x731341 result 0xffff:
[ 782.127642] NVRM: failed to register with the ACPI subsystem!
[ 782.127690] NVRM: API mismatch: the client has the version 525.125.06, but
NVRM: this kernel module has the version 35.2.1. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 782.127705] NVRM: failed to unregister from the ACPI subsystem!
[ 857.083045] NVRM: failed to register with the ACPI subsystem!
[ 857.083090] NVRM: API mismatch: the client has the version 525.125.06, but
NVRM: this kernel module has the version 35.2.1. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 857.083105] NVRM: failed to unregister from the ACPI subsystem!
[ 884.532646] NVRM: failed to register with the ACPI subsystem!
[ 884.532694] NVRM: API mismatch: the client has the version 525.125.06, but
NVRM: this kernel module has the version 35.2.1. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 884.532712] NVRM: failed to unregister from the ACPI subsystem!
[ 942.045591] NVRM: failed to register with the ACPI subsystem!
[ 942.045636] NVRM: API mismatch: the client has the version 525.125.06, but
NVRM: this kernel module has the version 35.2.1. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 942.045651] NVRM: failed to unregister from the ACPI subsystem!
[ 1006.224437] NVRM: failed to register with the ACPI subsystem!
[ 1006.224488] NVRM: API mismatch: the client has the version 525.125.06, but
NVRM: this kernel module has the version 35.2.1. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 1006.224502] NVRM: failed to unregister from the ACPI subsystem!

but what do I do with this output?

Hi,

nvidia-smi doesn’t support the Jetson devices.
How do you get the output of nvidia-smi?

Is your container based on the r35.4.1?
If not, you will need to use it as a base image.

You can also try to restart the docker server to see if it helps.

$ sudo systemctl restart docker.service

Thanks.

Hi @AastaLLL thanks for your response!

I didn’t use it before on Jetson. Just followed someone recommendation online - obviously we can ignore…

The containers are based on r35.2.1
How can I use it as based image. Wouldn’t it means creating a new image but losing all the work (alot!) in the existing containers?

After I rebooted, as well as restarting the docker service the current error is:

docker start 06ecd0ca2743
Error response from daemon: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/06ecd0ca27437378a2b8aaa5888097672abe6986c29442971c8d2a24a8945f38/log.json: no such file or directory): nvidia-container-runtime did not terminate successfully: exit status 1: time=“2023-10-01T21:22:27-07:00” level=error msg=“error loading config: failed to read config values: (15, 12): no value can start with @”
: unknown
Error: failed to start containers: 06ecd0ca2743

if I edit the config file /etc/nvidia-container-runtime/config.toml and remove the @ for the ldconfig line so
from ldconfig = @/sbin/ldconfig.real" to “/sbin/ldconfig.real”
the error changes back to the original:
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown

sadly, after mistakenly using docker system prune -a which was “supposed to” only do the following

WARNING! This will remove:

  • all stopped containers
  • all networks not used by at least one container
  • all images without at least one container associated to them
  • all build cache

All my images and containers were deleted, after I “double-checked” before and thought I had no “stopped containers” and each image was associated with at-least one container. (note to myself or to anyone who may run into this: “stopped” in this case means “Exited” on the STATUS line if you run docker ps -a . It doesn’t means “suspended” as in a process. I guess whoever wrote the docker prune command did not think it is necessary to be consistent in wording it… :) ( I’m wondering, does “docker prune” even skips an “attached” running container to justify the name “prune” ? :) )

yiiks, all work is gone. Now I have to start from scratch. I hope I at-least learned something :)

Yet, starting now from a new fresh image (pulled after upgrade to 5.1.2) I still can’t create a new container getting:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown.

@AastaLLL is there a r35.4.1 container? where can I find it?

Yet, starting now from a new fresh image (pulled after upgrade to Jetpack 5.1.2) I still can’t create a new container, and getting:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown.

should I sudo apt-get remove --purge nvidia-* and then try the upgrade again as before:

sudo apt update
sudo apt dist-upgrade
sudo apt install --fix-broken -o Dpkg::Options::=“–force-overwrite”
reboot

OR

sudo apt-get install nvidia-driver-525 (which I think is the latest ?)

anything else I should try?

Hi,

Sorry to hear that.
Do you still have the Dockerfile so you can rebuild the image?

On Jetson, the GPU driver is included in the OS so you don’t need to install it through install nvidia-driver*'. Please run sudo apt install nvidia-jetpack` first to ensure everything is well-installed.
Then please try to launch our 35.4.1 base image to make sure docker can work well.
Please let us know if the container doesn’t work.

$ sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:35.4.1

Thanks.

Hi @AastaLLL Thanks for connecting. I haven’t used Dockerfile… I used a docker run with a long string of parameters. Which I usually save in progression notes file. Unfortunately the notes are in the Orin’s home directory… hopefully I can recover.

Hi,

Good luck!
If you encounter an error with the above command, please also let us know.

Thanks.

1 Like

Thanks! I was able to flash the Orin successfully, and happen to find copy of the notes on the external disk. So far so good!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.