Error when trying to run gazenet notebook

Hi, I was trying to run gazenet.ipynb, one of the TLT CV Sample WorkFlows, inside the TLT for Video Streaming Analytics container, on the DGX A100 server.

I had a problem with step 3. Generate tfrecords from labels in json format. Any advice on this matter?

!tlt gazenet dataset_convert -folder-suffix pipeline
-norm_folder_name Norm_Data
-sets p01-day03
-data_root_path $DATA_DOWNLOAD_DIR/MPIIFaceGaze/sample-dataset

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/usr/lib/python3.6/http/client.py”, line 1272, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1318, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1267, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1038, in _send_output
self.send(msg)
File “/usr/lib/python3.6/http/client.py”, line 976, in send
self.connect()
File “/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py”, line 43, in connect
sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/requests/adapters.py”, line 449, in send
timeout=timeout
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File “/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py”, line 403, in increment
raise six.reraise(type(error), error, _stacktrace)
File “/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py”, line 734, in reraise
raise value.with_traceback(tb)
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/usr/lib/python3.6/http/client.py”, line 1272, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1318, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1267, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1038, in _send_output
self.send(msg)
File “/usr/lib/python3.6/http/client.py”, line 976, in send
self.connect()
File “/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py”, line 43, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 205, in _retrieve_server_version
return self.version(api_version=False)[“ApiVersion”]
File “/usr/local/lib/python3.6/dist-packages/docker/api/daemon.py”, line 181, in version
return self._result(self._get(url), json=True)
File “/usr/local/lib/python3.6/dist-packages/docker/utils/decorators.py”, line 46, in inner
return f(self, *args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 228, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File “/usr/local/lib/python3.6/dist-packages/requests/sessions.py”, line 543, in get
return self.request(‘GET’, url, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/requests/sessions.py”, line 530, in request
resp = self.send(prep, **send_kwargs)
File “/usr/local/lib/python3.6/dist-packages/requests/sessions.py”, line 643, in send
r = adapter.send(request, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/requests/adapters.py”, line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/tlt”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py”, line 114, in main
args[1:]
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 250, in launch_command
docker_handler = self.handler_map[
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 107, in handler_map
docker_digest=map_val.docker_digest
File “/usr/local/lib/python3.6/dist-packages/tlt/components/docker_handler/docker_handler.py”, line 44, in init
self._docker_client = docker.from_env()
File “/usr/local/lib/python3.6/dist-packages/docker/client.py”, line 85, in from_env
timeout=timeout, version=version, **kwargs_from_env(**kwargs)
File “/usr/local/lib/python3.6/dist-packages/docker/client.py”, line 40, in init
self.api = APIClient(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 188, in init
self._version = self._retrieve_server_version()
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 213, in _retrieve_server_version
‘Error while fetching server API version: {0}’.format(e)
docker.errors.DockerException: Error while fetching server API version: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

What is the version of the request? Please run below command.

(venv_3.0) morganh@dl:~$ pip3 show requests

image

Where and how did you trigger your notebook? In your host PC or inside one docker?

I run the notebook inside the TLT for Video Streaming Analytics docker container.

TLT 3.0 is different from TLT 2.0.
In TLT 3.0, please download the jupyter notebooks in your host, then trigger the notebook directly in your host.
More detail info is in Getting Started With TLT — Transfer Learning Toolkit 3.0 documentation

Did you install tlt-launcher in the host or in the docker you mentioned?
That means, did you run everything in the docker?

I think I run the TLT 2.0, this is the image:

nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3

Hmm… I just realized, the tlt-launcher is version 3. And yes, I run everything inside the container. What do you suggest if I only have access to do everything inside a container?

Could you explain a bit more about this? Why do we still need to install the tlt launcher inside the tlt container?

image

So, may I confirm your step? Please correct me if any.

  1. You trigger a TLT 2.0 docker
  2. Then install tlt launcher inside this TLT 2.0 docker ?
  3. Then trigger the gazenet notebook in the TLT 2.0 docker ?

Yes, but the order is not quite right:

  1. I run the TLT 2.0 docker
  2. Then trigger the gazenet notebook inside this TLT 2.0 docker
  3. Inside the notebook, there is a step to install the tlt launcher, so I run it.

Got it. So if you only have access to do everything inside a container, please go ahead.
That means you trigger a container (it is a TLT 2.0 docker), then setup the environment for TLT 3.0.

Yes, the tlt-launcher is needed for TLT3.0.

As far as I known, your steps should be working.

But maybe you were missing installing some packages. Need to check further.

Please check the environment. Follow TLT Launcher — Transfer Learning Toolkit 3.0 documentation
Requirements and Installation — Transfer Learning Toolkit 3.0 documentation

I’m sorry could you please reproduce the error? My partner was following the same steps and he encountered the same error when invoking the “tlt gazenet dataset_convert” tool.

I will check.

I verify the step3. There is no issue.

!tlt gazenet dataset_convert -folder-suffix pipeline \
                              -norm_folder_name Norm_Data \
                              -sets p01-day03 \
                             -data_root_path $DATA_DOWNLOAD_DIR/MPIIFaceGaze/sample-dataset

2021-03-18 17:26:03,832 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2021-03-18 09:26:06.329712: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/mitgazenet/dataloader/augmentation_helper.py:22: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/mitgazenet/dataloader/augmentation_helper.py:22: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/mitgazenet/dataloader/gazenet_dataloader_augmentation_V2.py:38: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/dataio/data_converter.py:159: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/dataio/data_converter.py:162: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/dataio/data_converter.py:267: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

Test [‘p01-1’]
Validation [‘p01-0’]
Train [‘p01-4’, ‘p01-3’, ‘p01-2’]
Test [‘p01-1’]
Validation [‘p01-0’]
Train [‘p01-4’, ‘p01-3’, ‘p01-2’]
2021-03-18 17:26:35,052 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

My steps mentioned above is running in a host PC. I trigger the notebook via the host PC.
It works fine.

Now, I follow your steps (trigger a TLT 2.0 docker and install tlt launcher inside it). I can reproduce your error.
It is not related to gazenet. When you run below command, you will get the same error.
$ tlt ssd run ls

It should be a common error. There must be something missing. I think we need to install some packages.

Below is the solution for your case(run TLT 3.0 in a TLT 2.0 docker)

$ docker run --runtime=nvidia -it -v /var/run/docker.sock:/var/run/docker.sock nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 /bin/bash
root@a14bd3e33d2d:/workspace# pip3 install nvidia-pyindex
root@a14bd3e33d2d:/workspace# pip3 install nvidia-tlt
root@a14bd3e33d2d:/workspace# docker login nvcr.io
Username: $oauthtoken
Password: your-ngckey
root@a14bd3e33d2d:/workspace# tlt gazenet dataset_convert …

Thank you for the solution! I think it should work, now we had another problem with the cuda version in the host, which hasn’t been updated (11.0).

Is it possible to run the gazenet notebook using the TLT 2.0 only? I mean without installing the tlt-launcher version 3.0?

I am afraid not. The gazenet is a new network in TLT 3.0.

Hi, I’m not sure I should’ve opened a new post or not to ask this question.

How to ensure that the cuda update will be backward compatible for the other existing containers that are currently running on the host using the older version of cuda? Many thanks.