Tao toolkit facenet Error

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) : Geforce RTX 2080 Ti
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) : Detectnet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) : Tao 3.0
• Training spec file(If have, please share here) : facenet specs
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I was running the ipynb file provided to use Tao Toolkit’s Facenet. I get this error when I run the command line to convert to tfrecord after path mapping, data download, model download, etc.

Currently, the environment is working in the container created by building the Docker CV container provided by Nvidia.
: TAO Toolkit for Computer Vision | NVIDIA NGC

Creating a new directory for the output tfrecords dump.

  • command -

print(“Converting Tfrecords for wider train dataset”)
!mkdir -p $LOCAL_DATA_DIR/tfrecords && rm -rf $LOCAL_DATA_DIR/tfrecords/*
!tao detectnet_v2 dataset_convert
-d $SPECS_DIR/facenet_tfrecords_kitti_train.txt
-o $DATA_DOWNLOAD_DIR/tfrecords/training/kitti_train

  • result -

Converting Tfrecords for wider train dataset
2022-02-28 08:28:34,173 [INFO] root: Registry: [‘nvcr.io’]
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/usr/lib/python3.6/http/client.py”, line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1042, in _send_output
self.send(msg)
File “/usr/lib/python3.6/http/client.py”, line 980, in send
self.connect()
File “/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py”, line 43, in connect
sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/requests/adapters.py”, line 449, in send
timeout=timeout
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File “/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py”, line 403, in increment
raise six.reraise(type(error), error, _stacktrace)
File “/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py”, line 734, in reraise
raise value.with_traceback(tb)
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/usr/lib/python3.6/http/client.py”, line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1042, in _send_output
self.send(msg)
File “/usr/lib/python3.6/http/client.py”, line 980, in send
self.connect()
File “/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py”, line 43, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 205, in _retrieve_server_version
return self.version(api_version=False)[“ApiVersion”]
File “/usr/local/lib/python3.6/dist-packages/docker/api/daemon.py”, line 181, in version
return self._result(self._get(url), json=True)
File “/usr/local/lib/python3.6/dist-packages/docker/utils/decorators.py”, line 46, in inner
return f(self, *args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 228, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File “/usr/local/lib/python3.6/dist-packages/requests/sessions.py”, line 543, in get
return self.request(‘GET’, url, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/requests/sessions.py”, line 530, in request
resp = self.send(prep, **send_kwargs)
File “/usr/local/lib/python3.6/dist-packages/requests/sessions.py”, line 643, in send
r = adapter.send(request, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/requests/adapters.py”, line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/tao”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py”, line 115, in main
args[1:]
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 297, in launch_command
docker_handler = self.handler_map[
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 152, in handler_map
docker_mount_file=os.getenv(“LAUNCHER_MOUNTS”, DOCKER_MOUNT_FILE)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/docker_handler/docker_handler.py”, line 62, in init
self._docker_client = docker.from_env()
File “/usr/local/lib/python3.6/dist-packages/docker/client.py”, line 85, in from_env
timeout=timeout, version=version, **kwargs_from_env(**kwargs)
File “/usr/local/lib/python3.6/dist-packages/docker/client.py”, line 40, in init
self.api = APIClient(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 188, in init
self._version = self._retrieve_server_version()
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 213, in _retrieve_server_version
‘Error while fetching server API version: {0}’.format(e)
docker.errors.DockerException: Error while fetching server API version: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

If you are running tao inside a docker, please add below when trigger that docker.

-v /var/run/docker.sock:/var/run/docker.sock

Tank you!,
As you said, when I run docker, I mount host docker.sock and proceed again, but this time I get this error.
The current situation is that the error occurs even when there are no directories or files in /workspace,
Even if you manually create specs files in /workspace, an error occurs that the file cannot be found.
However, loading the path file with cat succeeds.

-command-

Creating a new directory for the output tfrecords dump.

print(“Converting Tfrecords for wider train dataset”)
!mkdir -p $LOCAL_DATA_DIR/tfrecords && rm -rf $LOCAL_DATA_DIR/tfrecords/*
!tao detectnet_v2 dataset_convert
-d $SPECS_DIR/facenet_tfrecords_kitti_train.txt
-o $DATA_DOWNLOAD_DIR/tfrecords/training/kitti_train

-error-

Converting Tfrecords for wider train dataset
2022-03-02 06:03:35,200 [INFO] root: Registry: [‘nvcr.io’]
2022-03-02 06:03:35,298 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
2022-03-02 06:03:35,397 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/root/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
Traceback (most recent call last):
File “/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py”, line 130, in
File “/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py”, line 119, in
File “/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py”, line 110, in main
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/cv_samples_v1.3.0/facenet/specs/facenet_tfrecords_kitti_train.txt’
2022-03-02 06:03:42,334 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

thank you. As you said, I mounted the host docker to create a container, and proceeded again.
However, this time with the same command, this error occurred.

The current situation is that the error occurs even when there are no directories or files in /workspace,
Even if you manually create specs files in /workspace, an error occurs that the file cannot be found.
However, loading the path file with cat succeeds.

-command

print(“Converting Tfrecords for wider train dataset”)
!mkdir -p $LOCAL_DATA_DIR/tfrecords && rm -rf $LOCAL_DATA_DIR/tfrecords/*
!tao detectnet_v2 dataset_convert
-d $SPECS_DIR/facenet_tfrecords_kitti_train.txt
-o $DATA_DOWNLOAD_DIR/tfrecords/training/kitti_train

-ERROR-

Converting Tfrecords for wider train dataset
2022-03-02 06:03:35,200 [INFO] root: Registry: [‘nvcr.io’]
2022-03-02 06:03:35,298 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
2022-03-02 06:03:35,397 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/root/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
Traceback (most recent call last):
File “/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py”, line 130, in
File “/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py”, line 119, in
File “/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py”, line 110, in main
FileNotFoundError: [Errno 2] No such file or directory: FileNotFoundError: [Errno 2] No such file or directory: ‘/workspace/tao-experiments/facenet/specs/facenet_tfrecords_kitti_train.txt’
2022-03-02 06:03:42,334 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Please check your ~/.tao_mounts.json.
Please note that the path after “tao xxx” should be a path inside the tao docker.

More info can be found in TAO Toolkit Launcher — TAO Toolkit 3.22.05 documentation

my “!cat ~/.tao_mouts.json” result is

{
“Mounts”: [
{
“source”: “/home/tao-experiments”,
“destination”: “/workspace/tao-experiments”
},
{
“source”: “/home/cv_samples_v1.3.0/facenet/specs”,
“destination”: “/workspace/tao-experiments/facenet/specs”
}
]
}

I don’t understand exactly, can you give an example of the path after tao xxx?

Could you run following command and share the log? The /workspace/tao-experiments/facenet/specs/facenet_tfrecords_kitti_train.txt is a path inside the docker.
tao detectnet_v2 run ls /workspace/tao-experiments/facenet/specs/facenet_tfrecords_kitti_train.txt

i solved it. i didn’t mount /home , /workspace in host directory to container.

but other… problem is

[command]
!tao detectnet_v2 evaluate -e $SPECS_DIR/facenet_train_resnet18_kitti.txt
-m $USER_EXPERIMENT_DIR/pretrain_models/facenet_vunpruned_v2.0/model.tlt
-k $KEY

[error]
2022-03-02 10:03:15,798 [INFO] root: Registry: [‘nvcr.io’]
2022-03-02 10:03:15,863 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
2022-03-02 10:03:15,985 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/root/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

2022-03-02 10:03:22,282 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

2022-03-02 10:03:22,381 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/facenet/specs/facenet_train_resnet18_kitti.txt
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2022-03-02 10:03:22,384 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2022-03-02 10:03:22,384 [INFO] root: Loading model weights.
Invalid decryption. Unable to open file (file signature not found). The key used to load the model is incorrect.
2022-03-02 10:03:23,824 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Please check the key. You can use explicit key and retry.

I got a new key and re-registered it, but the same error appeared even when I ran it. So I changed the key to nvidia_tlt and it worked. Is it because of running a pre-trained model(unpruned_v2.0) ? When training or evaluating other models, can I use the API key issued by an individual?

Yes, you can find the key in ngc link. Usually it is nvidia_tlt.

Yes.

1 Like

Thank you!!! T^T. now i’m training the model. haha

Umm… Do I have to mount both the /home directory and the /workspace directory on the host server just like I mount docker.sock when I create a docker container and work in it?

There are no files in /workspace … or do I have to manually put the files in /home/tao-experiments?
Because, even if you edit the txt file in /home/tao-experiments/facenet/specs/ to edit the model config, there is no change when you run it in Jupyter notebook. Where do I have to change the settings and where do I control /workspace…?

ex) if i change pretrained_model_file path

model_config {
pretrained_model_file: “/workspace/tao-experiments/facenet/pretrain_models/facenet_vunpruned_v2.0/model.tlt”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
}
arch: “resnet”
load_graph: true
}

but, my container’s /workplace directory has not any files.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Could you try again in terminal instead of jupyter notebook?

You mention that “now i’m training the model.”. So, the training is working, right?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.