Error in Generating TFrecords for yolov4

Please provide the following information when requesting support.

• Hardware (T4)
• Network Type (Yolo_v4/)
• TLT Version (Please run “tlt info --verbose” and share “docker pull nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ?

!tao yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_train.txt
-o $DATA_DOWNLOAD_DIR/training/tfrecords/train

Traceback (most recent call last):
File “/usr/local/bin/tao”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py”, line 115, in main
args[1:]
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 296, in launch_command
docker_logged_in(required_registry=self.task_map[task].docker_registry)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/utils.py”, line 129, in docker_logged_in
data = load_config_file(docker_config)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/utils.py”, line 66, in load_config_file
“No file found at: {}. Did you run docker login?”.format(config_path)
AssertionError: Config path must be a valid unix path. No file found at: /root/.docker/config.json. Did you run docker login?

If you are running tao inside a docker, please add below when trigger that docker.

-v /var/run/docker.sock:/var/run/docker.sock

I have pull a tao docker using command below

docker pull nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3

Then created a container named taonew using the command below

sudo docker run -it -e NVIDIA_VISIBLE_DEVICES=1 -d -p 5001:6001 --name taonew fadbda32c62f

So do i include the solution given with this and if yes how . pls help

For yolov4, please use the 1.15.5 version.

$ docker pull nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3

Then,

$ docker run --runtime=nvidia -it --rm -v /var/run/docker.sock:/var/run/docker.sock nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 /bin/bash

Tried doing the same , Image got pulled but the docker run command tries to connect to ngc.nvidia.com and gives an error of timeout, pls check attached a screenshot

Please check the network since it seems to be a connection issue.

Hi can you pls tell me how to insert this path in docker exec command “-v /var/run/docker.sock:/var/run/docker.sock”

As my container is already created and if i use this path in docker run command , its not able to reach ngc.nvidia.com due to proxy issues, i have to execute it from within the docker

Please refer to below topics.
https://forums.developer.nvidia.com/search?q=%22Did%20you%20run%20docker%20login%3F%22%20%23intelligent-video-analytics%3Atao-toolkit%20order%3Alatest

Tried the solution suggested, but still it gives an error for generating TF records

Traceback (most recent call last):
File “/usr/local/bin/tao”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py”, line 115, in main
args[1:]
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py”, line 296, in launch_command
docker_logged_in(required_registry=self.task_map[task].docker_registry)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/utils.py”, line 129, in docker_logged_in
data = load_config_file(docker_config)
File “/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/utils.py”, line 66, in load_config_file
“No file found at: {}. Did you run docker login?”.format(config_path)
AssertionError: Config path must be a valid unix path. No file found at: /root/.docker/config.json. Did you run docker login?
!tao yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_val.txt \

To narrow down, can you run below in terminal instead of notebook?
Assume you already login a docker A, then if you want to run tao docker inside this A docker, please
$ docker login nvcr.io
$ docker run --runtime=nvidia -it --rm -v /var/run/docker.sock:/var/run/docker.sock nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 /bin/bash

File Not found Error, File exists in the path mentioned

Following are the details

Created the Docker container called as taoyolov4 using the following command

docker run --runtime=nvidia -it -p 7001:8001 -v /var/run/docker.sock:/var/run/docker.sock nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 /bin/bash

I login into the container using

'Docker exec -it taoyolov4 bash

Following is the directory structure of my workspace

/workspace/cv_samples_v1.3.0/yolo_v4/

init.py
specs
tao-experiments
yolo_v4.ipynb

#Set up env variables and map drives

Setting up env variables for cleaner command line commands.

import os

print("Please replace the variable with your key.")
%env KEY=ZGNpYXZ0NHE1czFmbDBlcGR0Z2RzOHJqcWw6NGZjMjUwMDMtN2QyNC00MzYzLTlhZDctOTA1MDM3YTUwYTMy
%env USER_EXPERIMENT_DIR=/workspace/cv_samples_v1.3.0/yolo_v4/tao-experiments/yolo_v4
%env DATA_DOWNLOAD_DIR=/workspace/cv_samples_v1.3.0/yolo_v4/tao-experiments/data

# Set this path if you don't run the notebook from the samples directory.
%env NOTEBOOK_ROOT=/workspace/cv_samples_v1.3.0/yolo_v4

# Please define this local project directory that needs to be mapped to the TAO docker session.
# The dataset expected to be present in $LOCAL_PROJECT_DIR/data, while the results for the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/yolo_v4
%env LOCAL_PROJECT_DIR=tao-experiments
os.environ["LOCAL_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")

os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "yolo_v4")

# The sample spec files are present in the same path as the downloaded samples.
os.environ["LOCAL_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)
%env SPECS_DIR=/workspace/cv_samples_v1.3.0/yolo_v4/specs

# Showing list of specification files.
!ls -rlt $LOCAL_SPECS_DIR `Preformatted text`


Mapping up the local directories to the TAO docker.

import json
mounts_file = os.path.expanduser("~/.tao_mounts.json")

# Define the dictionary with the mapped drives
drive_map = {
    "Mounts": [
        # Mapping the data directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/cv_samples_v1.3.0/yolo_v4/tao-experiments"
        },
        # Mapping the specs directory.
        {
            "source": os.environ["LOCAL_SPECS_DIR"],
            "destination": os.environ["SPECS_DIR"]
        },
    ]
}


!tao yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_train.txt \
                             -o $DATA_DOWNLOAD_DIR/training/tfrecords/train


#ERROR

2022-04-26 11:41:32,999 [INFO] root: Registry: ['nvcr.io']
2022-04-26 11:41:33,232 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-04-26 11:41:33,467 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/root/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/dataset_convert.py", line 18, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py", line 110, in main
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/cv_samples_v1.3.0/yolo_v4/specs/yolo_v4_tfrecords_kitti_train.txt'
2022-04-26 11:41:51,298 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

To narrow down, you can login the container to debug.
$ Docker exec -it taoyolov4 bash

and then run
# yolo_v4 dataset_convert -d xxx -o xxx

Got the following error

root@jmngdprp016394:~# docker exec -it taoyolov4 bash
root@618c58a4fa47:/workspace# tao yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_train.txt \
>                              -o $DATA_DOWNLOAD_DIR/training/tfrecords/train
2022-04-26 16:20:24,271 [INFO] root: Registry: ['nvcr.io']
2022-04-26 16:20:24,500 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
Traceback (most recent call last):
  File "/usr/local/bin/tao", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tlt/entrypoint/entrypoint.py", line 115, in main
    args[1:]
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/instance_handler/local_instance.py", line 319, in launch_command
    docker_handler.run_container(command)
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/docker_handler/docker_handler.py", line 284, in run_container
    mount_data, env_vars, docker_options = self._get_mount_env_data()
  File "/usr/local/lib/python3.6/dist-packages/tlt/components/docker_handler/docker_handler.py", line 112, in _get_mount_env_data
    raise ValueError("Mount point source path doesn't exist. {}".format(mount['source']))
ValueError: Mount point source path doesn't exist. /workspace/tao-experiments

Can you pls explain the paths to be set, i am confused about the paths

You already login the tao docker. So please not use “tao” again.
Please try

root@618c58a4fa47:/workspace# yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_train.txt -o $DATA_DOWNLOAD_DIR/training/tfrecords/train

Done as suggested but still getting error. This has to do something with path

root@jmngdprp016394:~# docker exec -it taoyolov4 bash
root@618c58a4fa47:/workspace# yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_val.txt \
>                              -o $DATA_DOWNLOAD_DIR/val/tfrecords/val
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/dataset_convert.py", line 18, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py", line 110, in main
FileNotFoundError: [Errno 2] No such file or directory: '/yolo_v4_tfrecords_kitti_val.txt'

Please explain me the following path variables, i am really confused

Some details about my docker setup

I have downloaded cv_samples_v1.3.0 folder inside docker container

Inside this cv_samples folder i have all the models with their notebooks

action_recognition_net bpnet detectnet_v2 emotionnet fpenet heartratenet multitask_classification unet yolo_v4_tiny
augment classification dssd facenet

Including yolov4 (which i need for training)

Inside yolo_v4 i have

init.py specs tao-experiments yolo_v4.ipynb

This tao-experiments folder is created by me as

%env LOCAL_PROJECT_DIR=tao-experiments

Inside tao-experiments folder i have created two more folders

Data and yolo_v4

Data folder has downloaded data (kitti) data set

Now pls help in setting the path variable

%env USER_EXPERIMENT_DIR=
%env DATA_DOWNLOAD_DIR=
%env LOCAL_PROJECT_DIR=tao-experiments

%env SPECS_DIR=

How did you trigger above docker?

yes

I mean which command did you run to get above docker?