Can't start nvidia docker after moving docker to ssd

I followed the web instruction Software Setup - Jetson Xavier - RACECAR/Xthat explains how to move docker folder to ssd. Everything went smooth and I tested with docker Hello World and it worked as well. However, when I downloaded nvidia tensorflow container and ran it it gave me error. I would appreciate if someone can help. Thank you.

dragan@dragan-desktop:~$ sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-tensorflow:r32.4.4-tf2.3-py3

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused “process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --compat32 --graphics --utility --video --display --pid=23175 /xavier_ssd/home/dragan/docker/overlay2/966a2684ed283da28a5ec67bc7648c7a590533efabef3537cf194ebd1af6a5b8/merged]\\nnvidia-container-cli: mount error: file creation failed: /xavier_ssd/home/dragan/docker/overlay2/966a2684ed283da28a5ec67bc7648c7a590533efabef3537cf194ebd1af6a5b8/merged/usr/lib/aarch64-linux-gnu/libnvidia-fatbinaryloader.so.440.18: file exists\\n\""”: unknown.


dragan@dragan-desktop:~$ sudo service docker start
dragan@dragan-desktop:~$ sudo docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:

  1. The Docker client contacted the Docker daemon.
  2. The Docker daemon pulled the “hello-world” image from the Docker Hub.
    (arm64v8)
  3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
  4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/

For more examples and ideas, visit:
Orientation and setup | Docker Documentation

Kind regards,
Dragan

1 Like

Hi

Would you mind creating a symlink as mentioned in the below link first?

Suppose your new folder is /XavierSSD500/var/lib/docker.
Please try this:

$ ln -s /XavierSSD500/var/lib/docker /var/lib/docker

Thanks.

Hello,
I have no issue switching default docker image folder where it saves files. I have issue with docker starting after that. It starts docker hello-world just fine.

This is my daemon.json file:

{
“runtimes”: {
“nvidia”: {
“path”: “nvidia-container-runtime”,
“runtimeArgs”:
}
},

"data-root": "/mnt/xavier_ssd/Docker"

}

I flashed my xavier again and same thing happens after redoing things:

sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-tensorflow:r32.4.4-tf2.3-py3

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused “process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --compat32 --graphics --utility --video --display --pid=14138 /mnt/xavier_ssd/Docker/overlay2/8a0ea8137638b55a2fda06b3c15ec3a6a7e38f175b6ca350c40f31a1ae4e64a4/merged]\\nnvidia-container-cli: mount error: file creation failed: /mnt/xavier_ssd/Docker/overlay2/8a0ea8137638b55a2fda06b3c15ec3a6a7e38f175b6ca350c40f31a1ae4e64a4/merged/usr/lib/aarch64-linux-gnu/libnvidia-fatbinaryloader.so.440.18: file exists\\n\""”: unknown.

Any other suggestions how I can make this work? Thank you.

Dragan

Hi,

The error occurs when the nvidia runtime tries to mount the /usr/lib/aarch64-linux-gnu/ but found it already exists.
A similar issue can be found in the below link:

To solve this, could you check if any command that also mounts the folder and leads to the conflict?

Thanks.

Hi,
I don’t think I can figure this one on my own. So, I decided to abandon this approach and go with a regular install of python packages instead of using containers.

The reason I wanted to use containers was that i was getting error message when running LSTM model at MinMax Scaler cell but you provided a solution in another thread https://forums.developer.nvidia.com/t/error-importerror-usr-lib-aarch64-linux-gnu-libgomp-so-1-cannot-allocate-memory-in-static-tls-block-i-looked-through-available-threads-already/166494 that seems to work now if I run a terminal line export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 prior to running Jupyter. I appreciate your help, I think i will be ok now.

Regards,
Dragan