Mkdir: cannot create directory ‘/opt/ngccli’: File exists on docker restart

Cannot reuse docker nvcr.io/nvidia/tlt-streamanalytics v2.0_dp_py2 - on docker restart getting
mkdir: cannot create directory ‘/opt/ngccli’: File exists
And docker stops

Could you please elaborate more for the steps?

Its just docker run as documented for tlt docker followed by docker restart.
The docker entrypoint.sh does a mkdir that fails after docker restart,
It needs to be muted / not throw exception

What is “docker entrypoint.sh”?

Inside the tlt docker there is a bash script that runs first thing on every docker start the file name is “entrypoint.sh”
The commands inside it are performing mkdir that crashes on second docker start because the directory already exists.

Hi eenav,
In tlt user guide, it does not mention “docker restart”.

I try to login docker and exit docker for many times as below, it works.

$ docker run --runtime=nvidia -it nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2

–2020-05-11 09:41:57-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 13.225.99.53, 13.225.99.8, 13.225.99.28, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.225.99.53|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 19890399 (19M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’

ngccli_reg_linux.zip 100%[============================================================================================================>] 18.97M 5.45MB/s in 3.5s

2020-05-11 09:42:01 (5.39 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [19890399/19890399]

Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5

root@0ef10fde6a74:/workspace# cat /usr/local/bin/entrypoint.sh
#!/usr/bin/env bash
set -e

Run startup command

mkdir /opt/ngccli
wget https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip -P /opt/ngccli
unzip /opt/ngccli/ngccli_reg_linux.zip -d /opt/ngccli/
rm /opt/ngccli/*.zip
chmod u+x /opt/ngccli/ngc

Running passed command

if [[ “1" ]]; then eval "@”
fi

root@0ef10fde6a74:/workspace# exit
exit

$ docker run --runtime=nvidia -it nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2
–2020-05-11 09:44:36-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 13.225.99.28, 13.225.99.53, 13.225.99.60, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.225.99.28|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 19890399 (19M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’

ngccli_reg_linux.zip 100%[============================================================================================================>] 18.97M 8.75MB/s in 2.2s

2020-05-11 09:44:38 (8.75 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [19890399/19890399]

Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5
root@1d22e6f1d5ba:/workspace#

The point is reusing tlt container is the common case instead of running every time from image.
Currently reusing the same tlt container is impossible due to this mkdir error.

Run with --name tlt2
Then exit and try to get back to tlt2 via docker start tlt2

1 Like

Hi eenav,
Could you please paste your full log here? Thanks a lot.

Anyway, to modify “mkdir /opt/ngccli” to “mkdir -p /opt/ngccli” will solve this error.

But please paste your full log for others better understanding.

ubuntu@ip-172-31-43-202:~$ docker run --gpus all -it --name tlt2 -v /home/ubuntu/tlt_workspace:/home/mounted_workspace -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 /bin/bash
–2020-05-11 11:56:09-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 99.86.57.34, 99.86.57.96, 99.86.57.46, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|99.86.57.34|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 19890399 (19M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’

ngccli_reg_linux.zip 100%[==========================================================>] 18.97M 20.0MB/s in 0.9s

2020-05-11 11:56:11 (20.0 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [19890399/19890399]

Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5
root@fbadda496f64:/workspace# exit
exit
ubuntu@ip-172-31-43-202:~$ docker start tlt2
tlt2

ubuntu@ip-172-31-43-202:~$ docker logs tlt2
–2020-05-11 11:56:09-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 99.86.57.34, 99.86.57.96, 99.86.57.46, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|99.86.57.34|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 19890399 (19M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’

ngccli_reg_linux.zip 100%[==========================================================>] 18.97M 20.0MB/s in 0.9s

2020-05-11 11:56:11 (20.0 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [19890399/19890399]

Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5
root@fbadda496f64:/workspace# exit
exit
mkdir: cannot create directory ‘/opt/ngccli’: File exists

I know how to solve it but this issue could waste some time for customers, thanks :)

1 Like

Thanks for the details. I will sync with internal team to modify entrypoint.sh.

2 Likes

The issue has not yet been solved. I am facing the same issue as well. Requesting an urgent look into the matter, since setting up the container repeatedly is a major investment in terms of time.

1 Like

Hi AnimikhAich,
Could you please paste your test step here?

This is the docker container which is in a stopped state in my local machine

$ docker ps -a
CONTAINER ID        IMAGE                                            COMMAND                  CREATED             STATUS                      PORTS               NAMES
90d52087cbfc        nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2   "entrypoint.sh /bin/…"   24 hours ago        Exited (1) 7 minutes ago                        beautiful_shockley

I tried starting it with the following command, It starts and then stops automatically.

$ docker start beautiful_shockley     
beautiful_shockley

If I do docker ps to list the running containers, it shows me the following output, which means there are no running containers a the moment.

$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

Since the previous step did not work, I tried to run it in attached mode, to “Attach STDOUT/STDERR and forward signals”, but that leads to the following error:

$ docker start -a beautiful_shockley     
mkdir: cannot create directory ‘/opt/ngccli’: File exists

Hi AnimikhAich,
Thanks for your details. I will push internal team to fix.
For unblocking your case, please run
$ docker rm -fv container-id
and trigger again.

For example,

morganh@test:~$ docker ps -a |grep tlt2
e38626b8b41a nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 “entrypoint.sh /bin/…” About a minute ago Exited (0) 30 seconds ago tlt2

morganh@test:~$ docker rm -fv e38626b8b41a
e38626b8b41a

morganh@test:~$ docker run -it --name tlt2 nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 /bin/bash
–2020-05-16 08:03:57-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 13.225.99.53, 13.225.99.28, 13.225.99.8, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.225.99.53|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 19890399 (19M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’

ngccli_reg_linux.zip 100%[==============================================================================================>] 18.97M 8.42MB/s in 2.3s

2020-05-16 08:03:59 (8.42 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [19890399/19890399]

Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5
root@8449ddb4236a:/workspace#

I’ve faced the same issue and haven’t found a quick-fix here. Came up with this one:
use the parameter -v "/path/to/tmp/directory/on/host":"/opt" to point to an empty directory. Then the entrypoint.sh file does not throw any errors and you can use the container.

The file entrypoint.sh that’s being used is supposedly located under /usr/local/bin/entrypoint.sh and with two modifications it should work:
mkdir -p /opt/ngccli/
unzip -o /opt/ngccli/ngccli_reg_linux.zip -d /opt/ngccli/
So the entire file should look something like:

#!/usr/bin/env bash
set -e

## Run startup command 
mkdir -p /opt/ngccli
wget https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip -P /opt/ngccli
unzip -o /opt/ngccli/ngccli_reg_linux.zip -d /opt/ngccli/
rm /opt/ngccli/*.zip
chmod u+x /opt/ngccli/ngc

## Running passed command
if [[ "$1" ]]; then
        eval "$@"
fi

If you commit these changes to your docker image (docker commit) you can restart the new version without the additional mount afterwards.

1 Like

@b2prix21
Next release will include the fix.

1 Like