Jetson Nano AGX Xavier rebooting (docker is used)

Dear NVidia Team,

Some of our modules (not custom) are rebooting and the only errors we can see are:

Apr 13 11:27:46 ubuntu kernel: [   77.302989] INFO: rcu_sched detected stalls on CPUs/tasks:
Apr 13 11:27:46 ubuntu kernel: [   77.303167] 	0-...: (5 GPs behind) idle=871/140000000000001/0 softirq=1490/1490 fqs=2063 
Apr 13 11:27:46 ubuntu kernel: [   77.303313] 	(detected by 1, t=5253 jiffies, g=193, c=192, q=26)

And this one:

Apr 13 11:27:46 ubuntu kernel: [   77.303438] Task dump for CPU 0:
Apr 13 11:27:46 ubuntu  kernel: [   77.303446] ksoftirqd/0     R  running task        0     3      2 0x00000002
Apr 13 11:27:46 ubuntu  kernel: [   77.303461] Call trace:

These reboot can happen once in a while or happen a bunch of one in short time and in the later case we suppose that It can corrupt file system or docker and we need to reflash the module, here is the docker error we have:

Apr 13 16:16:15 ubuntu dockerd[7018]: time="2022-04-13T16:16:15.535322080Z" level=error msg="failed to mount overlay: no such device" storage-driver=overlay2
Apr 13 16:16:15 ubuntu dockerd[7018]: time="2022-04-13T16:16:15.535429728Z" level=error msg="[graphdriver] prior storage driver overlay2 failed: driver not supported"
Apr 13 16:16:15 ubuntu dockerd[7018]: failed to start daemon: error initializing graphdriver: driver not supported
Apr 13 16:16:15 ubuntu systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Apr 13 16:16:15 ubuntu systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 13 16:16:15 ubuntu systemd[1]: Failed to start Docker Application Container Engine.

We already tried to purge docker and /var/lib/docker folder, reboot the module and upgrade the linux package but we cannot start docker anymore.

Then we have 2 problems, the first one, which is the reboot of the machine.
The second one is that we cannot start docker anymore (maybe due to reboot).

Someone know the causes of these problems or there is a way to ix it ?

Thank you for your contribution, any help can be benefit for us and other users.

Hi,

We want to reproduce this issue and get more information about the error.
Could you share which JetPack you use and the steps to reproduce with us?

Thanks.

Hi,

Thank you for your quick reply, the problem is difficult to reproduce as we do not know when and why it comes. Last year we did not see this type of problems. It begins to appear on a module even when there are no “extra” software running. Now it appears on a new module, in “normal” operation our modules are running a docker-compose.

Here is the module configuration:

Module: Jetson AGX Xavier (32 GB) P2888-0004

Package: nvidia-jetpack

sudo apt show nvidia-jetpack -a

Version: 4.4.1-b50
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-cuda (= 4.4.1-b50), nvidia-opencv (= 4.4.1-b50), nvidia-cudnn8 (= 4.4.1-b50), nvidia-tensorrt (= 4.4.1-b50), nvidia-visionworks (= 4.4.1-b50), nvidia-container (= 4.4.1-b50), nvidia-vpi (= 4.4.1-b50), nvidia-l4t-jetson-multimedia-api (>> 32.4-0), nvidia-l4t-jetson-multimedia-api (<< 32.5-0)
Homepage: http://developer.nvidia.com/jetson
Download-Size: 29.4 kB
APT-Sources: https://repo.download.nvidia.com/jetson/t194 r32.4/main arm64 Packages
Description: NVIDIA Jetpack Meta Package

Package: nvidia-jetpack
Version: 4.4-b186
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-cuda (= 4.4-b186), nvidia-opencv (= 4.4-b186), nvidia-cudnn8 (= 4.4-b186), nvidia-tensorrt (= 4.4-b186), nvidia-visionworks (= 4.4-b186), nvidia-container (= 4.4-b186), nvidia-vpi (= 4.4-b186), nvidia-l4t-jetson-multimedia-api (>> 32.4-0), nvidia-l4t-jetson-multimedia-api (<< 32.5-0)
Homepage: http://developer.nvidia.com/jetson
Download-Size: 29.4 kB
APT-Sources: https://repo.download.nvidia.com/jetson/t194 r32.4/main arm64 Packages
Description: NVIDIA Jetpack Meta Package

Package: nvidia-jetpack
Version: 4.4-b144
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 200 kB
Depends: nvidia-container-csv-cuda (= 10.2.89-1), libopencv-python (= 4.1.1-2-gd5a58aa75), libvisionworks-sfm-dev (= 0.90.4.501), libvisionworks-dev (= 1.6.0.501), libnvparsers7 (= 7.1.0-1+cuda10.2), libnvinfer-plugin-dev (= 7.1.0-1+cuda10.2), libnvonnxparsers7 (= 7.1.0-1+cuda10.2), libnvinfer-samples (= 7.1.0-1+cuda10.2), libnvinfer-bin (= 7.1.0-1+cuda10.2), libvisionworks-samples (= 1.6.0.501), libvisionworks-tracking-dev (= 0.88.2.501), vpi-samples (= 0.2.0), tensorrt (= 7.1.0.16-1+cuda10.2), libopencv (= 4.1.1-2-gd5a58aa75), libnvinfer-doc (= 7.1.0-1+cuda10.2), libnvparsers-dev (= 7.1.0-1+cuda10.2), libnvidia-container0 (= 0.9.0~beta.1), nvidia-container-csv-visionworks (= 1.6.0.501), cuda-toolkit-10-2 (= 10.2.89-1), graphsurgeon-tf (= 7.1.0-1+cuda10.2), libcudnn8 (= 8.0.0.145-1+cuda10.2), libopencv-samples (= 4.1.1-2-gd5a58aa75), nvidia-container-csv-cudnn (= 8.0.0.145-1+cuda10.2), python-libnvinfer-dev (= 7.1.0-1+cuda10.2), libnvinfer-plugin7 (= 7.1.0-1+cuda10.2), libvisionworks (= 1.6.0.501), libcudnn8-doc (= 8.0.0.145-1+cuda10.2), nvidia-container-toolkit (= 1.0.1-1), libnvinfer-dev (= 7.1.0-1+cuda10.2), nvidia-l4t-jetson-multimedia-api (>> 32.4-0), nvidia-l4t-jetson-multimedia-api (<< 32.5-0), libopencv-dev (= 4.1.1-2-gd5a58aa75), vpi-dev (= 0.2.0), vpi (= 0.2.0), libcudnn8-dev (= 8.0.0.145-1+cuda10.2), python3-libnvinfer (= 7.1.0-1+cuda10.2), python3-libnvinfer-dev (= 7.1.0-1+cuda10.2), opencv-licenses (= 4.1.1-2-gd5a58aa75), nvidia-container-csv-tensorrt (= 7.1.0.16-1+cuda10.2), libnvinfer7 (= 7.1.0-1+cuda10.2), libnvonnxparsers-dev (= 7.1.0-1+cuda10.2), uff-converter-tf (= 7.1.0-1+cuda10.2), nvidia-docker2 (= 2.2.0-1), libvisionworks-sfm (= 0.90.4.501), libnvidia-container-tools (= 0.9.0~beta.1), nvidia-container-runtime (= 3.1.0-1), python-libnvinfer (= 7.1.0-1+cuda10.2), libvisionworks-tracking (= 0.88.2.501)
Conflicts: cuda-command-line-tools-10-0, cuda-compiler-10-0, cuda-cublas-10-0, cuda-cublas-dev-10-0, cuda-cudart-10-0, cuda-cudart-dev-10-0, cuda-cufft-10-0, cuda-cufft-dev-10-0, cuda-cuobjdump-10-0, cuda-cupti-10-0, cuda-curand-10-0, cuda-curand-dev-10-0, cuda-cusolver-10-0, cuda-cusolver-dev-10-0, cuda-cusparse-10-0, cuda-cusparse-dev-10-0, cuda-documentation-10-0, cuda-driver-dev-10-0, cuda-gdb-10-0, cuda-gpu-library-advisor-10-0, cuda-libraries-10-0, cuda-libraries-dev-10-0, cuda-license-10-0, cuda-memcheck-10-0, cuda-misc-headers-10-0, cuda-npp-10-0, cuda-npp-dev-10-0, cuda-nsight-compute-addon-l4t-10-0, cuda-nvcc-10-0, cuda-nvdisasm-10-0, cuda-nvgraph-10-0, cuda-nvgraph-dev-10-0, cuda-nvml-dev-10-0, cuda-nvprof-10-0, cuda-nvprune-10-0, cuda-nvrtc-10-0, cuda-nvrtc-dev-10-0, cuda-nvtx-10-0, cuda-samples-10-0, cuda-toolkit-10-0, cuda-tools-10-0, libcudnn7, libcudnn7-dev, libcudnn7-doc, libnvinfer-plugin6, libnvinfer6, libnvonnxparsers6, libnvparsers6
Homepage: http://developer.nvidia.com/jetson
Download-Size: 30.4 kB
APT-Sources: https://repo.download.nvidia.com/jetson/t194 r32.4/main arm64 Packages
Description: NVIDIA Jetpack Meta Package

Hi,

Since we already have some newer releases, it’s an option to upgrade the software to JetPack 4.6.1?
Thanks.

Hi,

To upgrade, I followed instructions from here

sudo vi /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
deb https://repo.download.nvidia.com/jetson/common r32.6.1 main
deb https://repo.download.nvidia.com/jetson/t194 r32.6.1 main

Then

sudo apt update

But I got some errors even with the folowing command:

sudo apt update --allow-unauthenticated --allow-insecure-repositories
Err:8 https://repo.download.nvidia.com/jetson/common r32.6.1 Release
  404  Not Found [IP: 2.21.172.106 443]
Err:9 https://repo.download.nvidia.com/jetson/t194 r32.6.1 Release
  404  Not Found [IP: 2.21.172.106 443]
Reading package lists... Done
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://ports.ubuntu.com/ubuntu-ports bionic InRelease: Splitting up /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_bionic_InRelease into data and signature failed
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://download.docker.com/linux/ubuntu bionic InRelease: Splitting up /var/lib/apt/lists/download.docker.com_linux_ubuntu_dists_bionic_InRelease into data and signature failed
E: The repository 'https://repo.download.nvidia.com/jetson/common r32.6.1 Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'https://repo.download.nvidia.com/jetson/t194 r32.6.1 Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

But it worked when I specify jetpack 4.6 and not subversion

deb https://repo.download.nvidia.com/jetson/common r32.6 main
deb https://repo.download.nvidia.com/jetson/t194 r32.6 main
sudo apt update
sudo apt upgrade
sudo reboot

After that docker service succeed to start but I do not know if reboot problems are gone are not until I will run all software for a while.

I encoutered some problems in upgrading:

  • 3 packages cannot be installed:
Do you want to continue? [Y/n] Y
Setting up vpi1-demos (1.1.15) ...
mkdir: cannot create directory ‘/home/user-storage//Desktop’: Permission denied
dpkg: error processing package vpi1-demos (--configure):
 installed vpi1-demos package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of nvidia-vpi:
 nvidia-vpi depends on vpi1-demos (= 1.1.15); however:
  Package vpi1-demos is not configured yet.

dpkg: error processing package nvidia-vpi (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of nvidia-jetpack:
 nvidia-jetpack depends on nvidia-vpi (= 4.6-b199); however:
  Package nvidia-vpi is not configured yet.

dpkg: error processing package nvidia-jetpack (--configure):
 dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous failure.
                                                                                                          No apport report written because the error message indicates its a followup error from a previous failure.
                                        Errors were encountered while processing:
 vpi1-demos
 nvidia-vpi
 nvidia-jetpack
E: Sub-process /usr/bin/dpkg returned an error code (1)
  • Internal Disk became full:
Filesystem      Size  Used Avail Use% Mounted on
/dev/mmcblk0p1   30G   29G     0 100% /
none             17G     0   17G   0% /dev
tmpfs            17G   58k   17G   1% /dev/shm
  • I cannot upgrade to JetPack 4.6.1 here is actual JetPack:
Package: nvidia-jetpack
Version: 4.6-b199
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation

Hi,

Sorry that it is confusing about the OS version and JetPack version.
For JetPack 4.6.1, the OS version is r32.7.1.

So please use below source instead:
/etc/apt/sources.list.d/nvidia-l4t-apt-source.list

deb https://repo.download.nvidia.com/jetson/common r32.7 main
deb https://repo.download.nvidia.com/jetson/t194 r32.7 main

Thanks.

1 Like

Hi thank you for your answer, with these debians repositories I could upgrade to 4.6.1.

Package: nvidia-jetpack
Version: 4.6.1-b110
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-cuda (= 4.6.1-b110), nvidia-opencv (= 4.6.1-b110), nvidia-cudnn8 (= 4.6.1-b110), nvidia-tensorrt (= 4.6.1-b110), nvidia-visionworks (= 4.6.1-b110), nvidia-container (= 4.6.1-b110), nvidia-vpi (= 4.6.1-b110), nvidia-l4t-jetson-multimedia-api (>> 32.7-0), nvidia-l4t-jetson-multimedia-api (<< 32.8-0)
Homepage: http://developer.nvidia.com/jetson
Download-Size: 29.4 kB
APT-Manual-Installed: yes
APT-Sources: https://repo.download.nvidia.com/jetson/t194 r32.7/main arm64 Packages
Description: NVIDIA Jetpack Meta Package

At the end of the upgrade and also after reboot I still have these packages that have not been installed, is that normal ?

Processing triggers for bamfdaemon (0.5.3+18.04.20180207.2-0ubuntu1) ...
Rebuilding /usr/share/applications/bamf-2.index...
Processing triggers for nvidia-l4t-kernel (4.9.253-tegra-32.7.1-20220219090344) ...
Errors were encountered while processing:
 vpi1-demos
 nvidia-vpi
 nvidia-jetpack
E: Sub-process /usr/bin/dpkg returned an error code (1)

Hi,

Do you need the VPI libarary?
If not, it can just ignore the error.

Thanks.

Hi,

So the problem (reboot && docker daemon problem) apears on an other module with JetPack 4.4 then we did the upgrade by putting the os version r32.7.1 you give us.

There are still some packages (cuda) that could not be installed and after reboot, machine was unreachable because we lost connection and we see that eth1 network interface cannot be mounted.

In fact upgrade stopped because of some packages but it did not continue with other more important packages (related to network I suppose).

To make the eth1 network interface visible and available it was mandatory to continue the upgrade (with force argument) with this command, after that eth1 appears:

sudo apt --fix-broken install -o Dpkg::Options::="--force-overwrite"
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.