Docker: failed to mount overlay: no such device storage-driver=overlay2

Scriobhneoir · August 21, 2023, 10:55am

I have already reviewed the similar issue:

Which does not help.

Setup Details:
Hardware: Jetson Xavier Nx
OS: Linux nvidia-desktop 4.9.253-tegra #1 SMP PREEMPT Wed Apr 13 13:41:15 CST 2022 aarch64 aarch64 aarch64 GNU/Linux
Docker version 20.10.12, build 20.10.12-0ubuntu2~18.04.1

R32 (release), REVISION: 7.3, GCID: 31982016, BOARD: t186ref, EABI: aarch64, DATE: Tue Nov 22 17:32:54 UTC 2022

When I go to run the nvidia docker containers, I get the following error message:
‘’’
sudo dockerd
INFO[2023-08-18T21:47:26.253603724+08:00] Starting up
INFO[2023-08-18T21:47:26.265183922+08:00] detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf
INFO[2023-08-18T21:47:26.267152351+08:00] parsed scheme: “unix” module=grpc
INFO[2023-08-18T21:47:26.267221599+08:00] scheme “unix” not registered, fallback to default scheme module=grpc
INFO[2023-08-18T21:47:26.267304485+08:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 }] } module=grpc
INFO[2023-08-18T21:47:26.267362400+08:00] ClientConn switching balancer to “pick_first” module=grpc
INFO[2023-08-18T21:47:26.270441646+08:00] parsed scheme: “unix” module=grpc
INFO[2023-08-18T21:47:26.270516369+08:00] scheme “unix” not registered, fallback to default scheme module=grpc
INFO[2023-08-18T21:47:26.270582576+08:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 }] } module=grpc
INFO[2023-08-18T21:47:26.270621538+08:00] ClientConn switching balancer to “pick_first” module=grpc
ERRO[2023-08-18T21:47:26.278662980+08:00] failed to mount overlay: no such device storage-driver=overlay2
ERRO[2023-08-18T21:47:26.279116566+08:00] [graphdriver] prior storage driver overlay2 failed: driver not supported
failed to start daemon: error initializing graphdriver: driver not supported
‘’’

How do I resolve this error, to be able to run docker on this machine ?

DaneLLL · August 22, 2023, 5:19am

Hi,
Please refer to
NVIDIA L4T Base | NVIDIA NGC

And run the docker for r32.7.3. It is supposed to work if you follow the steps.

Scriobhneoir · August 22, 2023, 6:46am

Hi DaneLLL,

To clarify, this error message occurs even when I go to run any docker command - not just the nvidia docker containers. For example, if I run ‘docker container ls’, or ‘docker images’, I get the error message, ‘could not connect to Docker daemon’, and systemctl shows the error above.

This is more fundamental than the Nvidia-specific docker containers, it applies all docker commands

DaneLLL · August 22, 2023, 7:10am

Hi,
We don’t see the print while running the docker for Jetson platform. Please share your docker command for reference.

Scriobhneoir · August 22, 2023, 7:54am

Try to issue a docker command:

sudo docker container ls:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Then I check the status:

systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2023-08-21 19:25:55 CST; 20h ago
     Docs: https://docs.docker.com
  Process: 32594 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
 Main PID: 32594 (code=exited, status=1/FAILURE)

Aug 21 19:25:53 nvidia-desktop systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 19:25:55 nvidia-desktop systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Aug 21 19:25:55 nvidia-desktop systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 19:25:55 nvidia-desktop systemd[1]: Stopped Docker Application Container Engine.
Aug 21 19:25:55 nvidia-desktop systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 19:25:55 nvidia-desktop systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 19:25:55 nvidia-desktop systemd[1]: Failed to start Docker Application Container Engine.

Check with journalctl:

journalctl -eu docker
Aug 21 19:25:48 nvidia-desktop dockerd[32446]: time="2023-08-21T19:25:48.584716135+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Aug 21 19:25:48 nvidia-desktop dockerd[32446]: time="2023-08-21T19:25:48.584792073+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Aug 21 19:25:48 nvidia-desktop dockerd[32446]: time="2023-08-21T19:25:48.584899947+08:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil>
Aug 21 19:25:48 nvidia-desktop dockerd[32446]: time="2023-08-21T19:25:48.584965868+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Aug 21 19:25:48 nvidia-desktop dockerd[32446]: time="2023-08-21T19:25:48.596805345+08:00" level=error msg="failed to mount overlay: no such device" storage-driver=overlay2
Aug 21 19:25:48 nvidia-desktop dockerd[32446]: time="2023-08-21T19:25:48.597043878+08:00" level=error msg="exec: \"fuse-overlayfs\": executable file not found in $PATH" storage-driver=fuse-overlayfs
Aug 21 19:25:48 nvidia-desktop dockerd[32446]: time="2023-08-21T19:25:48.601805128+08:00" level=error msg="AUFS was not found in /proc/filesystems" storage-driver=aufs
Aug 21 19:25:48 nvidia-desktop dockerd[32446]: time="2023-08-21T19:25:48.608008232+08:00" level=error msg="failed to mount overlay: no such device" storage-driver=overlay
Aug 21 19:25:48 nvidia-desktop dockerd[32446]: failed to start daemon: error initializing graphdriver: devicemapper: Error running deviceCreate (CreatePool) dm_task_run failed
Aug 21 19:25:48 nvidia-desktop systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 19:25:48 nvidia-desktop systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 19:25:48 nvidia-desktop systemd[1]: Failed to start Docker Application Container Engine.

I have tried unsuccessfully the steps outlined here:

Which made no difference.

DaneLLL · August 23, 2023, 4:45am

Hi,
Please try the commands:

$ xhost +
$ sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.7.1

Scriobhneoir · August 23, 2023, 7:37am

sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.7.1
[sudo] password for nvidia: 
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See 'docker run --help'.

The issue is not with the nvidia docker, or nvidia docker runtime, it’s with the OS and docker itself.

DaneLLL · August 24, 2023, 8:45am

Hi,
We can run the command successfully:

nvidia@nvidia-desktop:~$ sudo docker run -it --rm --net=host --runtime nvidia -e
 DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.
7.1
[sudo] password for nvidia:
Unable to find image 'nvcr.io/nvidia/l4t-base:r32.7.1' locally
r32.7.1: Pulling from nvidia/l4t-base
f46992f278c2: Pull complete
d0ec296fcb76: Pull complete
9e18ddc8ca7a: Pull complete
457ba495c8e5: Pull complete
71bca45e35bd: Pull complete
644761cdc735: Pull complete
11628dbc31eb: Pull complete
d364c3700c33: Pull complete
01869d070b2e: Pull complete
cc3009375042: Pull complete
3e182d6364dc: Pull complete
f72feb4812f9: Pull complete
151eb940bbec: Pull complete
0e9dda2495b9: Pull complete
0e78bdc2f297: Pull complete
8dc68d594a4e: Pull complete
Digest: sha256:a374d81695f172fcda9da8db23f60d8bc35948762f71f3ee69564b4f6be5ef1c
Status: Downloaded newer image for nvcr.io/nvidia/l4t-base:r32.7.1
root@nvidia-desktop:/#

Please check if you can re-flash system image and try again.

Scriobhneoir · August 24, 2023, 8:53am

Reflashing the system image is not an option for us, as the device is not physically accessible.

Are there any other steps we can try ?

Scriobhneoir · August 28, 2023, 7:11pm

Hi @DaneLLL - are there any other suggested steps ?

The device is remote and while it can be accessed via SSH it cannot be physically accessed for reflashing.

DaneLLL · August 29, 2023, 3:30am

Hi,
We would suggest physically access the board and do clean re-flashing. This would be more efficient than remotely doing trial and error.

Scriobhneoir · August 29, 2023, 7:47am

Thanks for the suggestion, however in this case that option does not exist, as the device is not physically accessible.

While I appreciate that remote trial and error is difficult, would you be able to provide any suggestions or paths to attempt ?

Scriobhneoir · August 31, 2023, 7:47am

When I run the command to check for the driver, I get:

modprobe: FATAL: Module overlay not found in directory /lib/modules/4.9.253-tegra

This is present on other, supposedly identical flashed boards.

Update - adding some extra information:
On the device, there seem to be two kernels installed:

ls /lib/modules
4.9.253-tegra  4.9.299-tegra

overlayfs is in the 4.9.299-tegra kernel:

ls 4.9.299-tegra/kernel/fs
binfmt_misc.ko  btrfs  cifs  fuse  nfs_common  nfsd  overlayfs

But missing in the other (running) kernel - note no ‘fs’ directory here:

ls 4.9.253-tegra/kernel/
drivers

but uname -a returns:

Linux nvidia-desktop 4.9.253-tegra

Is it possible to use the new kernel ? Is that a risky operation over a remote connection ?

system · October 9, 2023, 7:11am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.