Is there some spacial things about bpnet? A question about "tlt bpnet dataset_convert " for bpnet

I have run several classification examples without errors. When I tested the bpnet example, I encountered these information:


Generate TFRecords for training dataset

!tlt bpnet dataset_convert
-m ‘train’
-o $DATA_DIR/train
–generate_masks
–dataset_spec $DATA_POSE_SPECS_DIR/coco_spec.json


Traceback (most recent call last):
File “/usr/local/bin/tlt”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py”, line 114, in main
args[1:]
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 258, in launch_command
docker_logged_in(required_registry=self.task_map[task].docker_registry)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/utils.py”, line 130, in docker_logged_in
data = load_config_file(docker_config)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/utils.py”, line 66, in load_config_file
“No file found at: {}. Did you run docker login?”.format(config_path)
AssertionError: Config path must be a valid unix path. No file found at: /root/.docker/config.json. Did you run docker login?


It seems that I have not the file “/root/.docker/config.json”. It should be in the docker or in the master computer? I created one in the docker by the command “touch”, but I encountered another error:


Generate TFRecords for training dataset

!tlt bpnet dataset_convert
-m ‘train’
-o $DATA_DIR/train
–generate_masks
–dataset_spec $DATA_POSE_SPECS_DIR/coco_spec.json


Traceback (most recent call last):
File “/usr/local/bin/tlt”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py”, line 114, in main
args[1:]
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 258, in launch_command
docker_logged_in(required_registry=self.task_map[task].docker_registry)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/utils.py”, line 130, in docker_logged_in
data = load_config_file(docker_config)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/utils.py”, line 71, in load_config_file
data = json.load(cfile)
File “/usr/lib/python3.6/json/init.py”, line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File “/usr/lib/python3.6/json/init.py”, line 354, in loads
return _default_decoder.decode(s)
File “/usr/lib/python3.6/json/decoder.py”, line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python3.6/json/decoder.py”, line 357, in raw_decode
raise JSONDecodeError(“Expecting value”, s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)


How to fix it?

See Launch tlt detectnet_v2 - #2 by Morganh
For AssertionError: Config path must be a valid unix path. No file found at: /root/.docker/config.json , please consider below solution.
You need to run docker login nvcr.io in your host pc.

If I run docker login nvcr.io in my host pc, I encountered these information:


Traceback (most recent call last):
File “/usr/local/bin/tlt”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py”, line 114, in main
args[1:]
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 258, in launch_command
docker_logged_in(required_registry=self.task_map[task].docker_registry)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/utils.py”, line 130, in docker_logged_in
data = load_config_file(docker_config)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/utils.py”, line 66, in load_config_file
“No file found at: {}. Did you run docker login?”.format(config_path)
AssertionError: Config path must be a valid unix path. No file found at: /root/.docker/config.json. Did you run docker login?


After I had run docker login nvcr.io in my host pc, I run docker login nvcr.io in docker, and encountered these information:

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/usr/lib/python3.6/http/client.py”, line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1042, in _send_output
self.send(msg)
File “/usr/lib/python3.6/http/client.py”, line 980, in send
self.connect()
File “/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py”, line 43, in connect
sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/requests/adapters.py”, line 449, in send
timeout=timeout
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File “/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py”, line 403, in increment
raise six.reraise(type(error), error, _stacktrace)
File “/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py”, line 734, in reraise
raise value.with_traceback(tb)
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py”, line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/usr/lib/python3.6/http/client.py”, line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib/python3.6/http/client.py”, line 1042, in _send_output
self.send(msg)
File “/usr/lib/python3.6/http/client.py”, line 980, in send
self.connect()
File “/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py”, line 43, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 205, in _retrieve_server_version
return self.version(api_version=False)[“ApiVersion”]
File “/usr/local/lib/python3.6/dist-packages/docker/api/daemon.py”, line 181, in version
return self._result(self._get(url), json=True)
File “/usr/local/lib/python3.6/dist-packages/docker/utils/decorators.py”, line 46, in inner
return f(self, *args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 228, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File “/usr/local/lib/python3.6/dist-packages/requests/sessions.py”, line 543, in get
return self.request(‘GET’, url, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/requests/sessions.py”, line 530, in request
resp = self.send(prep, **send_kwargs)
File “/usr/local/lib/python3.6/dist-packages/requests/sessions.py”, line 643, in send
r = adapter.send(request, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/requests/adapters.py”, line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/tlt”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py”, line 114, in main
args[1:]
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 259, in launch_command
docker_handler = self.handler_map[
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 114, in handler_map
docker_mount_file=os.getenv(“LAUNCHER_MOUNTS”, DOCKER_MOUNT_FILE)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/docker_handler/docker_handler.py”, line 47, in init
self._docker_client = docker.from_env()
File “/usr/local/lib/python3.6/dist-packages/docker/client.py”, line 85, in from_env
timeout=timeout, version=version, **kwargs_from_env(**kwargs)
File “/usr/local/lib/python3.6/dist-packages/docker/client.py”, line 40, in init
self.api = APIClient(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 188, in init
self._version = self._retrieve_server_version()
File “/usr/local/lib/python3.6/dist-packages/docker/api/client.py”, line 213, in _retrieve_server_version
‘Error while fetching server API version: {0}’.format(e)
docker.errors.DockerException: Error while fetching server API version: (‘Connection aborted.’, FileNotFoundError(2, ‘No such file or directory’))


What is the error? How to fix it?

How about running $ sudo docker login nvcr.io in host pc ?

If not able to run with sudo, please see NVIDIA TAO Documentation
Once you have installed docker-ce, follow the post-installation steps to ensure that the docker can be run without sudo .

It seems too difficult to set the rootless docker. Any easy method to run the bpnet sample? I try the docker for a long time and have to give up. But I still hope to make use of this bpnet sample. Any other way?

Firstly, can you run below command successfully with root access?
$ docker run hello-world

More,
Are you triggering tlt docker based on one docker?
In this case, please
add -v /var/run/docker.sock:/var/run/docker.sock

See Tlt augment not working

I can run the docker hello-world without sudo. And every time I run “docker run” followed with -v /var/run/docker.sock:/var/run/docker.sock.
But the error is still standing there.


“No file found at: {}. Did you run docker login?”.format(config_path)
AssertionError: Config path must be a valid unix path. No file found at: /root/.docker/config.json. Did you run docker login?


In fact I find that tlt bpnet dataset_convert should run on host and that is the value of tlt Launcher. It seems to be a right way to make use of tlt. But a new error happens when I run the command


tlt bpnet dataset_convert -m ‘train’ -o $DATA_DIR/train --generate_masks --dataset_spec $DATA_POSE_SPECS_DIR/coco_spec.json


It says that FileNotFoundError: [Errno 2] No such file or directory: ‘/coco_spec.json’. But in fact, this file is at right location.
How to fix it?

Please check your ~/.tlt_mounts.json.
This file will map your local directories to the docker.
Then there is a debug method for you. Login the docker and then check where is the cco_spec.json.
$ tlt bpnet run /bin/bash

cat ~/.tlt_mounts.json:


{
“Mounts”: [
{
“source”: “/mnt/a478b327-1a7f-4a07-8d94-90d724dee801/ls/tlt-experiments/bpnet”,
“destination”: “/workspace/tlt-experiments”
},
{
“source”: “/mnt/a478b327-1a7f-4a07-8d94-90d724dee801/ls/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/specs”,
“destination”: “/workspace/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/specs”
},
{
“source”: “/mnt/a478b327-1a7f-4a07-8d94-90d724dee801/ls/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/data_pose_config”,
“destination”: “/workspace/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/data_pose_config”
},
{
“source”: “/mnt/a478b327-1a7f-4a07-8d94-90d724dee801/ls/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/model_pose_config”,
“destination”: “/workspace/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/model_pose_config”
}
]
}


Login the docker and then check where is the cco_spec.json:


root@12102f81bc87:/workspace/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/data_pose_config# ls
coco_spec.json


It seems Ok, but the tlt launcher cannot find the file cco_spec.json.
How to fix it?

I encounter the error “No such file or directory: ‘/coco_spec.json’”:


Traceback (most recent call last):
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/dataset_convert.py”, line 119, in
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/dataset_convert.py”, line 101, in main
FileNotFoundError: [Errno 2] No such file or directory: ‘/coco_spec.json’
Traceback (most recent call last):
File “/usr/local/bin/bpnet”, line 8, in
sys.exit(main())
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/entrypoint/bpnet.py”, line 12, in main
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py”, line 300, in launch_job
AssertionError: Process run failed.
2021-07-28 21:01:02,029 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.


Following yours advice, I run the below command:


tlt bpnet run ls /workspace/tlt_cv_samples/bpnet/data_pose_config/coco_spec.json
2021-07-28 21:29:30,398 [INFO] root: Registry: [‘nvcr.io’]
2021-07-28 21:29:30,450 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
ls: cannot access ‘/workspace/tlt_cv_samples/bpnet/data_pose_config/coco_spec.json’: No such file or directory
2021-07-28 21:29:31,198 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.


In fact the file coco_spec.json is there.
I run this command:


cat ~/.tlt_mounts.json
{
“Mounts”: [
{
“source”: “/mnt/a478b327-1a7f-4a07-8d94-90d724dee801/ls/tlt-experiments/bpnet”,
“destination”: “/workspace/tlt-experiments”
},
{
“source”: “/mnt/a478b327-1a7f-4a07-8d94-90d724dee801/ls/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/specs”,
“destination”: “/workspace/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/specs”
},
{
“source”: “/mnt/a478b327-1a7f-4a07-8d94-90d724dee801/ls/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/data_pose_config”,
“destination”: “/workspace/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/data_pose_config”
},
{
“source”: “/mnt/a478b327-1a7f-4a07-8d94-90d724dee801/ls/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/model_pose_config”,
“destination”: “/workspace/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/model_pose_config”
}
]
}


what should I do?

Can you run below command to check if the file is available?
$ tlt bpnet run ls /workspace/tlt-experiments/tlt_cv_samples_v1.1.0/bpnet/data_pose_config

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.