Running tlt- docker.errors.DockerException: Error while fetching server API version

Please provide the following information when requesting support.

• Hardware V100 on google cloud
• Network Type Detectnet_v2
• TLT Version v3.0-py3
• Training spec file

kitti_config {
  root_directory_path: "/workspace/openalpr/lpd/data"
  image_dir_name: "image"
  label_dir_name: "label"
  image_extension: ".jpg"
  partition_mode: "random"
  num_partitions: 2
  val_split: 20
  num_shards: 4
}

image_directory_path: "/workspace/openalpr/lpd/data"

• How to reproduce the issue ?

     sudo apt-get install -y nvidia-docker2
     sudo pkill -SIGHUP dockerd
     sudo docker run --runtime=nvidia -it -v /home/Dowload/object_tracking/tlt-experiments:/workspace/tlt-experiments nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 /bin/bash
     pip3 install nvidia-pyindex
     pip3 install nvidia-tlt
     docker login nvcr.io
     cd tlt-experiments/

I follow the instruction available at Creating a Real-Time License Plate Detection and Recognition App | NVIDIA Technical Blog, then I try to run the below command and get the error:

     tlt detectnet_v2 dataset_convert -d /workspace/openalpr/SPECS_tfrecord.txt -o /workspace/openalpr/lpd_tfrecord/lpd

2021-06-26 16:32:12,697 [INFO] root: Registry: ['nvcr.io']
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1272, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1318, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1267, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 976, in send
    self.connect()
  File "/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 727, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py", line 403, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1272, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1318, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1267, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 976, in send
    self.connect()
  File "/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 205, in _retrieve_server_version
    return self.version(api_version=False)["ApiVersion"]
  File "/usr/local/lib/python3.6/dist-packages/docker/api/daemon.py", line 181, in version
    return self._result(self._get(url), json=True)
  File "/usr/local/lib/python3.6/dist-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 228, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/tlt", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py", line 114, in main
    args[1:]
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py", line 259, in launch_command
    docker_handler = self.handler_map[
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py", line 114, in handler_map
    docker_mount_file=os.getenv("LAUNCHER_MOUNTS", DOCKER_MOUNT_FILE)
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/docker_handler/docker_handler.py", line 47, in __init__
    self._docker_client = docker.from_env()
  File "/usr/local/lib/python3.6/dist-packages/docker/client.py", line 85, in from_env
    timeout=timeout, version=version, **kwargs_from_env(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/docker/client.py", line 40, in __init__
    self.api = APIClient(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 188, in __init__
    self._version = self._retrieve_server_version()
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 213, in _retrieve_server_version
    'Error while fetching server API version: {0}'.format(e)
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

I will appreciate any suggestion which might help me.

Try to add " -v /var/run/docker.sock:/var/run/docker.sock"
Reference:

Thanks for the reply. How could I have access to the output directory and trained models later? Since in my command it’s possible by tlt-experiments

You can still use the old “-v”. Just add more “-v”.
-v /home/Dowload/object_tracking/tlt_experiments:/workspace/tlt-experiments -v /var/run/docker.sock:/var/run/docker.sock

Thank you again. I ran the below command.

$ sudo docker run --runtime=nvidia -it  -v /home/Dowload/object_tracking/tlt-experiments:/
workspace/tlt-experiments  -v /var/run/docker.sock:/var/run/docker.sock nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 /bin/bash 

But when I run the below command it says No mount points were found in the /root/.tlt_mounts.json file. and then I get another error which says No such file or directory: ‘/workspace/tlt-experiments/openalpr/SPECS_tfrecord.txt’

$ tlt detectnet_v2 dataset_convert -d /workspace/tlt-experiments/openalpr/SPECS_tfrecord.txt -o /workspace/tlt-experiments/openalpr/lpd_tfrecord/lpd

2021-06-27 06:36:39,736 [INFO] root: Registry: ['nvcr.io']
2021-06-27 06:36:39,916 [INFO] root: No mount points were found in the /root/.tlt_mounts.json file.
2021-06-27 06:36:39,916 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
Traceback (most recent call last):
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py", line 104, in <module>
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py", line 93, in <module>
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py", line 84, in main
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/tlt-experiments/openalpr/SPECS_tfrecord.txt'
2021-06-27 06:36:48,839 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

While when I check the path

$ ls /workspace/tlt-experiments/openalpr/
SPECS_tfrecord.txt  lpd_tfrecord

any idea?

Please refer to https://docs.nvidia.com/tlt/tlt-user-guide/text/tlt_launcher.html.

According to your log, you are running tlt3.0 based on one docker(tlt2.0)
For TLT 3.0, it is needed to create ~/tlt_mounts.json file.
It can map local directory o the docker.

In the command line, the path should be the “destination” path inside the docker.

There is also a simple way for reference. You can set paths to the same.
For example,

“source”: “/home/omno/Desktop/umair/tlt-samples/classification”,
“destination” : “/home/omno/Desktop/umair/tlt-samples/classification”

even after setting the ~/.tlt_mounts.json file

{
"Mounts": [
        {
            "source": "/home/Dowload/object_tracking/tlt-experiments/data",
            "destination": "/workspace/tlt-experiments/data"
        },
        {
            "source": "/home/Dowload/object_tracking/tlt-experiments/results",
            "destination": "/workspace/tlt-experiments/results"
        },
        {
            "source": "/home/Dowload/object_tracking/tlt-experiments/specs",
            "destination": "/workspace/tlt-experiments/specs"
        }
    ],
    "Envs": [
        {
            "variable": "CUDA_DEVICE_ORDER",
            "value": "PCI_BUS_ID"
        }
    ],
    "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
        },
        "user": "0:0",
        "ports": {
            "8888": 8888
        }
    }
}

I get mounting error

tlt detectnet_v2 dataset_convert -d /workspace/tlt-experiments/openalpr/SPECS_tfrecord.txt -o /workspace/tlt-experiments/openalpr/lpd_tfrecord/lpd

2021-06-27 10:01:15,555 [INFO] root: Registry: ['nvcr.io']
Traceback (most recent call last):
  File "/usr/local/bin/tlt", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py", line 114, in main
    args[1:]
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py", line 278, in launch_command
    docker_handler.run_container(command)
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/docker_handler/docker_handler.py", line 268, in run_container
    mount_data, env_vars, docker_options = self._get_mount_env_data()
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/docker_handler/docker_handler.py", line 97, in _get_mount_env_data
    raise ValueError("Mount point source path doesn't exist. {}".format(mount['source']))
ValueError: Mount point source path doesn't exist. /home/Dowload/object_tracking/tlt-experiments/data

while the directory exist

 ls /home/Dowload/object_tracking/tlt-experiments/data
data_object_image_2.zip  data_object_label_2.zip  testing  training

what am I missing?

Can you run following command to check if the files are available inside the docker?
$ tlt detectnet_v2 run ls /workspace/tlt-experiments/data

seems not, any idea?

tlt detectnet_v2 run ls /workspace/tlt-experiments/data

2021-06-27 10:40:17,031 [INFO] root: Registry: ['nvcr.io']
Traceback (most recent call last):
  File "/usr/local/bin/tlt", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py", line 114, in main
    args[1:]
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py", line 278, in launch_command
    docker_handler.run_container(command)
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/docker_handler/docker_handler.py", line 268, in run_container
    mount_data, env_vars, docker_options = self._get_mount_env_data()
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/docker_handler/docker_handler.py", line 97, in _get_mount_env_data
    raise ValueError("Mount point source path doesn't exist. {}".format(mount['source']))
ValueError: Mount point source path doesn't exist. /home/Dowload/object_tracking/tlt-experiments/data

So, seems that there is something wrong in the ~/.tlt_mounts json file or your local directory.
Could you $ chmod 777 -R your-local-files ?

It didnt help, still getting the same error. Also I restart docker after that but no progress. I cant find lots of info on this error.

To narrow down, can you login TLT3.0 docker via interactive commands inside the docker containing the detectnet_v2 task, run the following command:

tlt detectnet_v2

This command opens up an interactive session inside the tlt-streamanalytics docker.

See NVIDIA TAO Documentation

Please run
$ mkdir /home/Dowload/object_tracking/tlt-experiments/data

Hi! I have a similar problem, were you able to fix the problem?

@joaquin1
Please see the comments above. If there is no hints, please create a new forum topic. Thanks.

@joaquin1
Ignore my request. I saw your topic in TLT 3 error running detectnet_v2 dataset_convert.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.