TAO TF1 DetectNet_v2 Dataset_Converter Typo

Description

  1. There is a typo in the output of the dataset_convert command (tfrecords_waring.json; ‘n’ is missing) for DetectNet_v2 when errors occur during converting images with kitti labels (only tested at the moment) to TFRecords. I have execute the command in a docker container (information below).

  2. Could you please describe, what is the difference between the log_warning.json and the tfrecords_warning.json? I can only find the tfrecords_warning.json file in my use-case.

Log / Traceback

First run:

2023-11-02 17:33:02,683 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 114: Tfrecords generation complete.
2023-11-02 17:33:02,683 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 221: Writing the log_warning.json
2023-11-02 17:33:02,683 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 224: There were errors in the labels. Details are logged at /test/tao/TestProject/5.0.0/FirstTry/dataset/tfrecords_waring.json

Second run:

2023-11-02 18:54:45,983 [TAO Toolkit] [INFO] root 2102: TFRecords generation complete.
2023-11-02 18:54:45,983 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 221: Writing the log_warning.json
2023-11-02 18:54:45,983 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 224: There were errors in the labels. Details are logged at /test/tao/TestProject/5.0.0/FirstTry/dataset/tfrecords/_waring.json

Update

I just ran another conversion with a slightly different execution command and get different results. I edited the logs and steps in this topic accordingly. I added the ‘-r’ parameter that is undocumented at the moment (!) (DetectNet_v2 - NVIDIA Docs). In the first run, I had the problem, that the files do not at the directory, that I expected. I thought, that the ‘-o’ argument is for the filename (interpreted as directory-path). In the second run, the log name is changed to ‘_waring.json’, as you can see in the logs. As I can see in the code, the results_dir will be the place, where the status.json is created, and it (‘-r’) is optional and the ‘-o’ parameter is the path including the “first part” of the final TFRecord-filenames.

My findings

According to the code, that I found in the container, my second question is obsolete, because the log_warning.json is the same, as the tfrecords_warning.json.

dataset_converter_lib.py:

    def _save_log_warnings(self):
        """Store out of bound bounding boxes to a json file."""
        if self.log_warning:
            logger.info("Writing the log_warning.json")
            with open(f"{self.output_filename}_warning.json", "w") as f:
                json.dump(self.log_warning, f, indent=2)
            logger.info("There were errors in the labels. Details are logged at"
                        " %s_waring.json", self.output_filename)

I think it would be good, if you can remove the first info-logging, because it is useless (and the name of the log is wrong, so confusing) and correct the name of the log in the output (“waring” → “warning”)

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version: Ubuntu 22.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5

Steps To Reproduce

First run:

docker run -v $CONFIG_DATASET_CONVERT:$CONFIG_DATASET_CONVERT -v $LOCAL_DIR/dataset/kitti:$LOCAL_DIR/dataset/kitti -v $LOCAL_DIR/dataset/tfrecords:$LOCAL_DIR/dataset/tfrecords --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 $DOCKER_IMAGE detectnet_v2 dataset_convert -d "$CONFIG_DATASET_CONVERT" -o "$LOCAL_DIR/dataset/tfrecords"

Second run:

docker run -v $CONFIG_DATASET_CONVERT:$CONFIG_DATASET_CONVERT -v $LOCAL_DIR/dataset/kitti:$LOCAL_DIR/dataset/kitti -v $LOCAL_DIR/dataset/tfrecords:$LOCAL_DIR/dataset/tfrecords --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm $DOCKER_IMAGE detectnet_v2 dataset_convert -d "$CONFIG_DATASET_CONVERT" -o "$LOCAL_DIR/dataset/tfrecords/tfrecords" -r "$LOCAL_DIR/dataset/tfrecords/"

→ The variables are all valid

Hi @nkaaf ,
We request you to raise the concern on Issues · triton-inference-server/server · GitHub.

Thank you

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!