Unable to access containers after upgrading to Jetpack 5.1.2 on Orin AGX

hg1 · September 30, 2023, 2:54am

I finally got around to upgrade to Jetpack 51.2 (which we talked about some time ago on “Nvidia-util-515 and nvidia-utils-525 error” ) .

I followed the instructions on 1.3.2
How to Install JetPack :: NVIDIA JetPack Documentation exactly and no error after. I rebooted and everything looks normal. But then I tried to access my working containers and I get:

docker start 06ecd0ca2743 (or same if using the docker’s name)

Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown
Error: failed to start containers: 06ecd0ca2743

how can I solve this?

Some output that might be relevant:

nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX Open Kernel Module for aarch64 35.2.1 Release Build (buildbrain@mobile-u64-6348-d7000) Tue Jan 24 15:26:51 PST 2023

cat /sys/module/nvidia/version
35.2.1

sudo dmesg | grep NVRM
[ 15.219716] NVRM: loading NVIDIA UNIX Open Kernel Module for aarch64 35.2.1 Release Build (buildbrain@mobile-u64-6348-d7000) Tue Jan 24 15:26:51 PST 2023
[ 23.274017] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080013f result 0x56:
[ 23.275197] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080017e result 0x56:
[ 23.279964] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x2080014a result 0x56:
[ 23.360821] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x731341 result 0xffff:
[ 23.361384] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x730190 result 0x56:
[ 25.747613] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x731341 result 0xffff:
[ 26.047019] NVRM rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x731341 result 0xffff:
[ 782.127642] NVRM: failed to register with the ACPI subsystem!
[ 782.127690] NVRM: API mismatch: the client has the version 525.125.06, but
NVRM: this kernel module has the version 35.2.1. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 782.127705] NVRM: failed to unregister from the ACPI subsystem!
[ 857.083045] NVRM: failed to register with the ACPI subsystem!
[ 857.083090] NVRM: API mismatch: the client has the version 525.125.06, but
NVRM: this kernel module has the version 35.2.1. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 857.083105] NVRM: failed to unregister from the ACPI subsystem!
[ 884.532646] NVRM: failed to register with the ACPI subsystem!
[ 884.532694] NVRM: API mismatch: the client has the version 525.125.06, but
NVRM: this kernel module has the version 35.2.1. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 884.532712] NVRM: failed to unregister from the ACPI subsystem!
[ 942.045591] NVRM: failed to register with the ACPI subsystem!
[ 942.045636] NVRM: API mismatch: the client has the version 525.125.06, but
NVRM: this kernel module has the version 35.2.1. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 942.045651] NVRM: failed to unregister from the ACPI subsystem!
[ 1006.224437] NVRM: failed to register with the ACPI subsystem!
[ 1006.224488] NVRM: API mismatch: the client has the version 525.125.06, but
NVRM: this kernel module has the version 35.2.1. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 1006.224502] NVRM: failed to unregister from the ACPI subsystem!

but what do I do with this output?

AastaLLL · October 2, 2023, 4:00am

Hi,

nvidia-smi doesn’t support the Jetson devices.
How do you get the output of nvidia-smi?

Is your container based on the r35.4.1?
If not, you will need to use it as a base image.

You can also try to restart the docker server to see if it helps.

$ sudo systemctl restart docker.service

Thanks.

hg1 · October 2, 2023, 4:31am

Hi @AastaLLL thanks for your response!

I didn’t use it before on Jetson. Just followed someone recommendation online - obviously we can ignore…

The containers are based on r35.2.1
How can I use it as based image. Wouldn’t it means creating a new image but losing all the work (alot!) in the existing containers?

After I rebooted, as well as restarting the docker service the current error is:

docker start 06ecd0ca2743
Error response from daemon: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/06ecd0ca27437378a2b8aaa5888097672abe6986c29442971c8d2a24a8945f38/log.json: no such file or directory): nvidia-container-runtime did not terminate successfully: exit status 1: time=“2023-10-01T21:22:27-07:00” level=error msg=“error loading config: failed to read config values: (15, 12): no value can start with @”
: unknown
Error: failed to start containers: 06ecd0ca2743

if I edit the config file /etc/nvidia-container-runtime/config.toml and remove the @ for the ldconfig line so
from ldconfig = @/sbin/ldconfig.real" to “/sbin/ldconfig.real”
the error changes back to the original:
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown

sadly, after mistakenly using docker system prune -a which was “supposed to” only do the following

WARNING! This will remove:

all stopped containers
all networks not used by at least one container
all images without at least one container associated to them
all build cache

All my images and containers were deleted, after I “double-checked” before and thought I had no “stopped containers” and each image was associated with at-least one container. (note to myself or to anyone who may run into this: “stopped” in this case means “Exited” on the STATUS line if you run docker ps -a . It doesn’t means “suspended” as in a process. I guess whoever wrote the docker prune command did not think it is necessary to be consistent in wording it… :) ( I’m wondering, does “docker prune” even skips an “attached” running container to justify the name “prune” ? :) )

yiiks, all work is gone. Now I have to start from scratch. I hope I at-least learned something :)

Yet, starting now from a new fresh image (pulled after upgrade to 5.1.2) I still can’t create a new container getting:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown.

hg1 · October 2, 2023, 2:30pm

@AastaLLL is there a r35.4.1 container? where can I find it?

Yet, starting now from a new fresh image (pulled after upgrade to Jetpack 5.1.2) I still can’t create a new container, and getting:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown.

should I sudo apt-get remove --purge nvidia-* and then try the upgrade again as before:

sudo apt update
sudo apt dist-upgrade
sudo apt install --fix-broken -o Dpkg::Options::=“–force-overwrite”
reboot

OR

sudo apt-get install nvidia-driver-525 (which I think is the latest ?)

anything else I should try?

AastaLLL · October 4, 2023, 5:19am

Hi,

Sorry to hear that.
Do you still have the Dockerfile so you can rebuild the image?

On Jetson, the GPU driver is included in the OS so you don’t need to install it through install nvidia-driver*'. Please run sudo apt install nvidia-jetpack` first to ensure everything is well-installed.
Then please try to launch our 35.4.1 base image to make sure docker can work well.
Please let us know if the container doesn’t work.

$ sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:35.4.1

Thanks.

hg1 · October 4, 2023, 5:39am

Hi @AastaLLL Thanks for connecting. I haven’t used Dockerfile… I used a docker run with a long string of parameters. Which I usually save in progression notes file. Unfortunately the notes are in the Orin’s home directory… hopefully I can recover.

AastaLLL · October 5, 2023, 5:42am

Hi,

Good luck!
If you encounter an error with the above command, please also let us know.

Thanks.

hg1 · October 5, 2023, 6:02am

Thanks! I was able to flash the Orin successfully, and happen to find copy of the notes on the external disk. So far so good!

system · October 25, 2023, 8:11am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JetPack 6.3 containerd and kubernetes Jetson AGX Orin nvbugs , containers	12	512	August 22, 2024
Unable to locate package nvidia-jetpack on Orin devkit Jetson AGX Orin reflash , nvbugs	31	12657	May 27, 2022
Docker issue on permission? Jetson AGX Orin docker	5	81	November 22, 2024
Error run docker image nvcr.io/nvidia/l4t-base:r32.6.1 on jetson AGX Jetson AGX Xavier docker , jetson	9	2476	November 10, 2021
Can't install nvidia-jetpack and docker on AGX Orin system Jetson AGX Orin docker	3	1201	November 9, 2023
Unable to run nvidia docker Jetson Xavier NX docker	4	3556	December 8, 2021
`nvidia-container-cli` driver error when trying to run Nvidia docker on Jetson Nano Jetson Nano cuda , containers	6	7289	October 18, 2021
Run docker images with cuda version different from the host cuda version Jetson AGX Orin cuda , docker , deepstream	4	1055	May 28, 2024
Cannot start base nvidia's docker images on Jetpack 4.6.2 Jetson AGX Xavier docker	4	819	June 13, 2022
Problem to build a docker container and use the GPU on JETSON AGX ORIGIN Jetson AGX Orin docker	3	571	August 30, 2023

Unable to access containers after upgrading to Jetpack 5.1.2 on Orin AGX

Related topics