Cannot reuse docker nvcr.io/nvidia/tlt-streamanalytics v2.0_dp_py2 - on docker restart getting
mkdir: cannot create directory ‘/opt/ngccli’: File exists
And docker stops
Could you please elaborate more for the steps?
Its just docker run as documented for tlt docker followed by docker restart.
The docker entrypoint.sh does a mkdir that fails after docker restart,
It needs to be muted / not throw exception
What is “docker entrypoint.sh”?
Inside the tlt docker there is a bash script that runs first thing on every docker start the file name is “entrypoint.sh”
The commands inside it are performing mkdir that crashes on second docker start because the directory already exists.
Hi eenav,
In tlt user guide, it does not mention “docker restart”.
I try to login docker and exit docker for many times as below, it works.
$ docker run --runtime=nvidia -it nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2
–2020-05-11 09:41:57-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 13.225.99.53, 13.225.99.8, 13.225.99.28, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.225.99.53|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 19890399 (19M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’ngccli_reg_linux.zip 100%[============================================================================================================>] 18.97M 5.45MB/s in 3.5s
2020-05-11 09:42:01 (5.39 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [19890399/19890399]
Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5root@0ef10fde6a74:/workspace# cat /usr/local/bin/entrypoint.sh
#!/usr/bin/env bash
set -eRun startup command
mkdir /opt/ngccli
wget https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip -P /opt/ngccli
unzip /opt/ngccli/ngccli_reg_linux.zip -d /opt/ngccli/
rm /opt/ngccli/*.zip
chmod u+x /opt/ngccli/ngcRunning passed command
if [[ “$1” ]]; then
eval “$@”
firoot@0ef10fde6a74:/workspace# exit
exit$ docker run --runtime=nvidia -it nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2
–2020-05-11 09:44:36-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 13.225.99.28, 13.225.99.53, 13.225.99.60, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.225.99.28|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 19890399 (19M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’ngccli_reg_linux.zip 100%[============================================================================================================>] 18.97M 8.75MB/s in 2.2s
2020-05-11 09:44:38 (8.75 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [19890399/19890399]
Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5
root@1d22e6f1d5ba:/workspace#
The point is reusing tlt container is the common case instead of running every time from image.
Currently reusing the same tlt container is impossible due to this mkdir error.
Run with --name tlt2
Then exit and try to get back to tlt2 via docker start tlt2
Hi eenav,
Could you please paste your full log here? Thanks a lot.
Anyway, to modify “mkdir /opt/ngccli” to “mkdir -p /opt/ngccli” will solve this error.
But please paste your full log for others better understanding.
ubuntu@ip-172-31-43-202:~$ docker run --gpus all -it --name tlt2 -v /home/ubuntu/tlt_workspace:/home/mounted_workspace -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 /bin/bash
–2020-05-11 11:56:09-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 99.86.57.34, 99.86.57.96, 99.86.57.46, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|99.86.57.34|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 19890399 (19M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’
ngccli_reg_linux.zip 100%[==========================================================>] 18.97M 20.0MB/s in 0.9s
2020-05-11 11:56:11 (20.0 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [19890399/19890399]
Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5
root@fbadda496f64:/workspace# exit
exit
ubuntu@ip-172-31-43-202:~$ docker start tlt2
tlt2
ubuntu@ip-172-31-43-202:~$ docker logs tlt2
–2020-05-11 11:56:09-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 99.86.57.34, 99.86.57.96, 99.86.57.46, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|99.86.57.34|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 19890399 (19M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’
ngccli_reg_linux.zip 100%[==========================================================>] 18.97M 20.0MB/s in 0.9s
2020-05-11 11:56:11 (20.0 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [19890399/19890399]
Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5
root@fbadda496f64:/workspace# exit
exit
mkdir: cannot create directory ‘/opt/ngccli’: File exists
I know how to solve it but this issue could waste some time for customers, thanks :)
Thanks for the details. I will sync with internal team to modify entrypoint.sh.
The issue has not yet been solved. I am facing the same issue as well. Requesting an urgent look into the matter, since setting up the container repeatedly is a major investment in terms of time.
Hi AnimikhAich,
Could you please paste your test step here?
This is the docker container which is in a stopped state in my local machine
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
90d52087cbfc nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 "entrypoint.sh /bin/…" 24 hours ago Exited (1) 7 minutes ago beautiful_shockley
I tried starting it with the following command, It starts and then stops automatically.
$ docker start beautiful_shockley
beautiful_shockley
If I do docker ps to list the running containers, it shows me the following output, which means there are no running containers a the moment.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Since the previous step did not work, I tried to run it in attached mode, to “Attach STDOUT/STDERR and forward signals”, but that leads to the following error:
$ docker start -a beautiful_shockley
mkdir: cannot create directory ‘/opt/ngccli’: File exists
Hi AnimikhAich,
Thanks for your details. I will push internal team to fix.
For unblocking your case, please run
$ docker rm -fv container-id
and trigger again.
For example,
morganh@test:~$ docker ps -a |grep tlt2
e38626b8b41a nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 “entrypoint.sh /bin/…” About a minute ago Exited (0) 30 seconds ago tlt2morganh@test:~$ docker rm -fv e38626b8b41a
e38626b8b41a
morganh@test:~$ docker run -it --name tlt2 nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 /bin/bash
–2020-05-16 08:03:57-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 13.225.99.53, 13.225.99.28, 13.225.99.8, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.225.99.53|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 19890399 (19M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’ngccli_reg_linux.zip 100%[==============================================================================================>] 18.97M 8.42MB/s in 2.3s
2020-05-16 08:03:59 (8.42 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [19890399/19890399]
Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5
root@8449ddb4236a:/workspace#
I’ve faced the same issue and haven’t found a quick-fix here. Came up with this one:
use the parameter -v "/path/to/tmp/directory/on/host":"/opt"
to point to an empty directory. Then the entrypoint.sh file does not throw any errors and you can use the container.
The file entrypoint.sh that’s being used is supposedly located under /usr/local/bin/entrypoint.sh
and with two modifications it should work:
mkdir -p /opt/ngccli/
unzip -o /opt/ngccli/ngccli_reg_linux.zip -d /opt/ngccli/
So the entire file should look something like:
#!/usr/bin/env bash
set -e
## Run startup command
mkdir -p /opt/ngccli
wget https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip -P /opt/ngccli
unzip -o /opt/ngccli/ngccli_reg_linux.zip -d /opt/ngccli/
rm /opt/ngccli/*.zip
chmod u+x /opt/ngccli/ngc
## Running passed command
if [[ "$1" ]]; then
eval "$@"
fi
If you commit these changes to your docker image (docker commit
) you can restart the new version without the additional mount afterwards.