Tlt.components.docker_handler.docker_handler: Stopping container

super-homura · June 15, 2022, 8:00am

ngccli does not work.
How can I work avoid this error?

This is happening in all of the EC2 containers.

Converting Tfrecords for kitti trainval dataset
2022-06-15 07:52:49,833 [INFO] root: Registry: ['nvcr.io']
2022-06-15 07:52:49,943 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3
2022-06-15 07:52:49,954 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2022-06-15 07:52:51,939 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

(launcher) ubuntu@ip-172-31-0-229:~$ docker images
REPOSITORY                          TAG                     IMAGE ID       CREATED       SIZE
nvcr.io/nvidia/tao/tao-toolkit-tf   v3.22.05-tf1.15.4-py3   ca92a571a959   3 weeks ago   16.1GB
↓
docker run -it ca92a571a959 bash
↓
  inflating: /opt/ngccli/ngc-cli/frozenlist/_frozenlist.cpython-39-x86_64-linux-gnu.so
 extracting: /opt/ngccli/ngc-cli.md5
chmod: cannot access '/opt/ngccli/ngc': No such file or directory
(launcher) ubuntu@ip-172-31-0-229:~$ ll /opt/ngccli/ngc
-rwxr-xr-x 1 ubuntu ubuntu 7247104 Jun 13 20:21 /opt/ngccli/ngc*

Morganh · June 15, 2022, 8:26am

Refer to 2nd workaround in Chmod: cannot access '/opt/ngccli/ngc': No such file or directory - #2 by Morganh

super-homura · June 15, 2022, 8:31am

sorry.
this error occur.

(launcher) ubuntu@ip-172-31-0-229:/opt$ docker images
REPOSITORY                          TAG                     IMAGE ID       CREATED       SIZE
nvcr.io/nvidia/tao/tao-toolkit-tf   v3.22.05-tf1.15.4-py3   ca92a571a959   3 weeks ago   16.1GB
(launcher) ubuntu@ip-172-31-0-229:/opt$ docker run --runtime=nvidia -it --rm --entrypoint "" nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3 bin/bash
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.
(launcher) ubuntu@ip-172-31-0-229:/opt$

super-homura · June 15, 2022, 8:32am

this command works.

docker run -it --rm --entrypoint "" nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3 bash

Morganh · June 15, 2022, 8:37am

super-homura:

docker run --runtime=nvidia -it --rm --entrypoint "" nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3 bin/bash
docker: Error response from daemon: Unknown runtime specified nvidia.

Please install nvidia-docker2.

super-homura · June 15, 2022, 8:48am

cluld you tell me please.
where is docker_handler.py ?

(launcher) ubuntu@ip-172-31-0-229:~$ which tao
/home/ubuntu/.virtualenvs/launcher/bin/tao
(launcher) ubuntu@ip-172-31-0-229:~$ pip3 --version
pip 22.1.2 from /home/ubuntu/.virtualenvs/launcher/lib/python3.8/site-packages/pip (python 3.8)

(launcher) ubuntu@ip-172-31-0-229:~/.virtualenvs/launcher/lib/python3.8/site-packages$ ll | grep tao
drwx------   2 ubuntu ubuntu   4096 Jun  8 07:28 nvidia_tao-0.1.23.dist-info/

Modify lib/python3.6/site-packages/tao/components/docker_handler/docker_handler.py . This file should be available when you install nvidia-tao.
VALID_DOCKER_ARGS = [“user”, “ports”, “shm_size”, “ulimits”, “privileged”, “network”]

to

VALID_DOCKER_ARGS = [“user”, “ports”, “shm_size”, “ulimits”, “privileged”, “network”, “entrypoint”]

Morganh · June 15, 2022, 8:53am

Can you find
$ ls ~/.virtualenvs/launcher/lib/python3.8/site-packages/tlt

or

$ ls ~/.virtualenvs/launcher/lib/python3.8/site-packages/tao

super-homura · June 15, 2022, 8:53am

(launcher) ubuntu@ip-172-31-0-229:~/.virtualenvs/launcher/lib/python3.8/site-packages$  ls ~/.virtualenvs/launcher/lib/python3.8/site-packages/tlt
__init__.py  __pycache__  components  config  entrypoint  license  version.py
(launcher) ubuntu@ip-172-31-0-229:~/.virtualenvs/launcher/lib/python3.8/site-packages$ ls ~/.virtualenvs/launcher/lib/python3.8/site-packages/tao
ls: cannot access '/home/ubuntu/.virtualenvs/launcher/lib/python3.8/site-packages/tao': No such file or directory
(launcher) ubuntu@ip-172-31-0-229:~/.virtualenvs/launcher/lib/python3.8/site-packages$

Morganh · June 15, 2022, 8:54am

super-homura:

(launcher) ubuntu@ip-172-31-0-229:~/.virtualenvs/launcher/lib/python3.8/site-packages$  ls ~/.virtualenvs/launcher/lib/python3.8/site-packages/tlt
__init__.py  __pycache__  components  config  entrypoint  license  version.py

That’s it. You install nvidia-tlt instead of nvidia-tao.
So, please go ahead and modify docker_handler.py which locates at tlt folder.

super-homura · June 15, 2022, 9:10am

I fixed it, but Jupyter docker has the same error.

■~/.tao_mounts.json

{
    "DockerOptions":{
        "entrypoint": "",
        "shm_size": "16G",
    "Mounts": [
        {
            "source": "/home/ubuntu/ubuntu/tmp/tlt-experiments/data",
            "destination": "/home/ubuntu/ubuntu/tmp/tlt-experiments/data"
        },
        {
            "source": "/home/ubuntu/ubuntu/tmp/tlt-experiments/detectnet_v2/specs",
            "destination": "/home/ubuntu/ubuntu/tmp/tlt-experiments/detectnet_v2/specs"
        },
        {
            "source": "/home/ubuntu/ubuntu/tmp/tlt-experiments/detectnet_v2/",
            "destination": "/home/ubuntu/ubuntu/tmp/tlt-experiments/detectnet_v2/"
        }
    ]
}

DEFAULT_DOCKER_PATH = "unix://var/run/docker.sock"
VALID_PORT_PROTOCOLS = ["tcp", "udp", "sctp"]
VALID_DOCKER_ARGS = ["user", "ports", "shm_size", "ulimits", "privileged", "network", "entrypoint"]

super-homura · June 15, 2022, 9:11am

# Creating a new directory for the output tfrecords dump.
print("Converting Tfrecords for kitti trainval dataset")
!mkdir -p $LOCAL_DATA_DIR/tfrecords && rm -rf $LOCAL_DATA_DIR/tfrecords/*
!tao detectnet_v2 dataset_convert \
                  -d $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt \
                  -o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval

Converting Tfrecords for kitti trainval dataset
2022-06-15 08:59:14,460 [INFO] root: Registry: ['nvcr.io']
2022-06-15 08:59:14,557 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3
Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/tao", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/entrypoint/entrypoint.py", line 113, in main
    local_instance.launch_command(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/components/instance_handler/local_instance.py", line 319, in launch_command
    docker_handler.run_container(command)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py", line 284, in run_container
    mount_data, env_vars, docker_options = self._get_mount_env_data()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py", line 92, in _get_mount_env_data
    data = self._load_mounts_file(self._docker_mount_file)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py", line 77, in _load_mounts_file
    data = json.load(mfile)
  File "/usr/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 20 column 1 (char 641)

Morganh · June 15, 2022, 9:12am

It is a new error.
You add additional “,” in ~/.tao_mounts.json.

super-homura · June 15, 2022, 9:15am

I got error.

(launcher) ubuntu@ip-172-31-0-229:~$ cat ~/.tao_mounts.json
{
    "DockerOptions": {
        "entrypoint": "",
        "shm_size": "16G"
    },
    "Mounts": [
        {
            "source": "/home/ubuntu/ubuntu/tmp/tlt-experiments/data",
            "destination": "/home/ubuntu/ubuntu/tmp/tlt-experiments/data"
        },
        {
            "source": "/home/ubuntu/ubuntu/tmp/tlt-experiments/detectnet_v2/specs",
            "destination": "/home/ubuntu/ubuntu/tmp/tlt-experiments/detectnet_v2/specs"
        },
        {
            "source": "/home/ubuntu/ubuntu/tmp/tlt-experiments/detectnet_v2/",
            "destination": "/home/ubuntu/ubuntu/tmp/tlt-experiments/detectnet_v2/"
        }
    ]
}

# Creating a new directory for the output tfrecords dump.
print("Converting Tfrecords for kitti trainval dataset")
!mkdir -p $LOCAL_DATA_DIR/tfrecords && rm -rf $LOCAL_DATA_DIR/tfrecords/*
!tao detectnet_v2 dataset_convert \
                  -d $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt \
                  -o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval

Converting Tfrecords for kitti trainval dataset
2022-06-15 09:14:11,986 [INFO] root: Registry: ['nvcr.io']
2022-06-15 09:14:12,103 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3
Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/tao", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/entrypoint/entrypoint.py", line 113, in main
    local_instance.launch_command(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/components/instance_handler/local_instance.py", line 319, in launch_command
    docker_handler.run_container(command)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py", line 289, in run_container
    self.start_container(volumes, env_variables, docker_options)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py", line 248, in start_container
    docker_args = self.get_docker_option_args(docker_options)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py", line 229, in get_docker_option_args
    assert key in VALID_DOCKER_ARGS, (
AssertionError: The parameter "entrypoint" mentioned in the config file isn't a valid option.
Please choose one of the following: ['user', 'ports', 'shm_size', 'ulimits', 'privileged', 'network']

Morganh · June 15, 2022, 9:18am

super-homura:

  File "/home/ubuntu/.local/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py", line 229, in get_docker_option_args
    assert key in VALID_DOCKER_ARGS, (
AssertionError: The parameter "entrypoint" mentioned in the config file isn't a valid option.
Please choose one of the following: ['user', 'ports', 'shm_size', 'ulimits', 'privileged', 'network']

Please modify /home/ubuntu/.local/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py

super-homura · June 15, 2022, 9:32am

it works.
but I want to ask you.
when I create a new EC2 instance this error occurs.
Can nvidia fix this officially?

Morganh · June 15, 2022, 9:34am

What do you mean by “a new EC2 is created” ?

super-homura · June 15, 2022, 9:42am

when I create EC2, I have to install ngc command.
before, it was not necessary.

.tao_mounts.json and docker_handler.py No modifications were necessary.

Morganh · June 15, 2022, 9:44am

Can you attach full log for better understanding?

Please start from
(launcher) ubuntu@ip-172-31-0-229:~$

yingliu · July 6, 2022, 6:42am

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

system · July 20, 2022, 6:42am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem with tlt file mounting TAO Toolkit	29	2354	January 6, 2022
Running tlt- docker.errors.DockerException: Error while fetching server API version TAO Toolkit	16	3679	August 28, 2021
Running TLT 3.0 in DGX A100, driver-version error TAO Toolkit	8	1369	September 19, 2021
Train with my own tlt model #2 TAO Toolkit	42	2778	February 8, 2022
Docker - No such container TAO Toolkit	7	62	March 10, 2025
Mkdir: cannot create directory ‘/opt/ngccli’: File exists on docker restart TAO Toolkit docker	16	3953	October 12, 2021
TAO toolkit happend some .so bug TAO Toolkit tao	19	907	September 9, 2022
Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI runtime create failed...) TAO Toolkit ubuntu , docker	51	8705	December 6, 2021
TAO data services Error response from daemon: No such container dataset convert error from kitti to COCO TAO Toolkit	14	434	June 11, 2024
Error when convert kitti to tfrecord in official notebook TLT3.0 TAO Toolkit	24	1393	October 12, 2021

Tlt.components.docker_handler.docker_handler: Stopping container

Related topics