TAO TF1 DetectNet_v2 Dataset_Converter Typo

nkaaf · November 2, 2023, 5:53pm

Description

There is a typo in the output of the dataset_convert command (tfrecords_waring.json; ‘n’ is missing) for DetectNet_v2 when errors occur during converting images with kitti labels (only tested at the moment) to TFRecords. I have execute the command in a docker container (information below).
Could you please describe, what is the difference between the log_warning.json and the tfrecords_warning.json? I can only find the tfrecords_warning.json file in my use-case.

Log / Traceback

First run:

2023-11-02 17:33:02,683 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 114: Tfrecords generation complete.
2023-11-02 17:33:02,683 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 221: Writing the log_warning.json
2023-11-02 17:33:02,683 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 224: There were errors in the labels. Details are logged at /test/tao/TestProject/5.0.0/FirstTry/dataset/tfrecords_waring.json

Second run:

2023-11-02 18:54:45,983 [TAO Toolkit] [INFO] root 2102: TFRecords generation complete.
2023-11-02 18:54:45,983 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 221: Writing the log_warning.json
2023-11-02 18:54:45,983 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 224: There were errors in the labels. Details are logged at /test/tao/TestProject/5.0.0/FirstTry/dataset/tfrecords/_waring.json

Update

I just ran another conversion with a slightly different execution command and get different results. I edited the logs and steps in this topic accordingly. I added the ‘-r’ parameter that is undocumented at the moment (!) (DetectNet_v2 - NVIDIA Docs). In the first run, I had the problem, that the files do not at the directory, that I expected. I thought, that the ‘-o’ argument is for the filename (interpreted as directory-path). In the second run, the log name is changed to ‘_waring.json’, as you can see in the logs. As I can see in the code, the results_dir will be the place, where the status.json is created, and it (‘-r’) is optional and the ‘-o’ parameter is the path including the “first part” of the final TFRecord-filenames.

My findings

According to the code, that I found in the container, my second question is obsolete, because the log_warning.json is the same, as the tfrecords_warning.json.

dataset_converter_lib.py:

    def _save_log_warnings(self):
        """Store out of bound bounding boxes to a json file."""
        if self.log_warning:
            logger.info("Writing the log_warning.json")
            with open(f"{self.output_filename}_warning.json", "w") as f:
                json.dump(self.log_warning, f, indent=2)
            logger.info("There were errors in the labels. Details are logged at"
                        " %s_waring.json", self.output_filename)

I think it would be good, if you can remove the first info-logging, because it is useless (and the name of the log is wrong, so confusing) and correct the name of the log in the output (“waring” → “warning”)

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version: Ubuntu 22.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5

Steps To Reproduce

First run:

docker run -v $CONFIG_DATASET_CONVERT:$CONFIG_DATASET_CONVERT -v $LOCAL_DIR/dataset/kitti:$LOCAL_DIR/dataset/kitti -v $LOCAL_DIR/dataset/tfrecords:$LOCAL_DIR/dataset/tfrecords --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 $DOCKER_IMAGE detectnet_v2 dataset_convert -d "$CONFIG_DATASET_CONVERT" -o "$LOCAL_DIR/dataset/tfrecords"

Second run:

docker run -v $CONFIG_DATASET_CONVERT:$CONFIG_DATASET_CONVERT -v $LOCAL_DIR/dataset/kitti:$LOCAL_DIR/dataset/kitti -v $LOCAL_DIR/dataset/tfrecords:$LOCAL_DIR/dataset/tfrecords --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm $DOCKER_IMAGE detectnet_v2 dataset_convert -d "$CONFIG_DATASET_CONVERT" -o "$LOCAL_DIR/dataset/tfrecords/tfrecords" -r "$LOCAL_DIR/dataset/tfrecords/"

→ The variables are all valid

AakankshaS · November 17, 2023, 7:37am

Hi @nkaaf ,
We request you to raise the concern on Issues · triton-inference-server/server · GitHub.

Thank you

AakankshaS · November 27, 2023, 7:38am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

nkaaf · December 24, 2023, 10:46am

Hi @AakankshaS,

I think, that this Issue belongs more to the topic “Accelerated Computing” → “Intelligent Video Analytics” → “TAO Toolkit”.
Could you please move it there, because I cannot do it?

Thank you very much :)

Topic		Replies	Views
Detectnetv2 tfrecords error TAO Toolkit	4	439	January 13, 2024
Detectnet_v2 notebook stuck at tfrecords conversion step TAO Toolkit	17	83	October 30, 2024
Detectnet2 TAO Toolkit model training fail on formating dataset on kitti format TAO Toolkit	69	1139	January 22, 2024
Error when training detectnet_v2 resnet34 on tfrecord file TAO Toolkit	7	521	October 19, 2022
Dataset_convert tool looks like it is running properly but the TFrecords aren’t populated in the output folders TAO Toolkit	21	926	July 21, 2022
Dataset_convert error in tao TAO Toolkit	3	460	September 16, 2022
Tao detectnet_v2 dataset_convert TAO Toolkit	4	864	August 15, 2023
TAO tao detectnet_v2 dataset_convert not converting KITTI file to tfrecords even if the SPECS file are correct TAO Toolkit tao	8	511	May 23, 2022
Detectnet_v2.ipynb issue with custom data TAO Toolkit tao	3	289	May 17, 2024
Empty TFRecords Being created From to detectnet_v2 dataset convert TAO Toolkit	8	1308	February 28, 2022