Tao-deploy yolo_v4 inference KITTI .txt result mismatch with image plot

Hi, I am doing evaluation of a trained YOLO4 model outside of TAO. What I need are the predicted labels in the KITTI .txt format.

When running the inference I get the images_annotated and labels. However, the labels do not contain the correct coordinates. This seems to be a bug I believe.

Command I run:

!tao-deploy yolo_v4 inference \
 --gpu_index=0 \
 -i /workspace/tao-experiments/test/image \
 -e $YOLO4_BASEDIR/yolo_v4_eval_spec.txt \
 -m $YOLO4_BASEDIR/yolo4_rtx5000_fp32_bs1.engine \
 -r $YOLO4_BASEDIR/inference \
 --batch_size 1

Example COCO image 000000415109 from annotated_images (I write down the image coordinates of the (x1, y1) and (x2, y2) positions):

The 000000415109.txt output:
person 0.00 0 0.00 208.127 79.538 478.843 415.359 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.993

Analysis of the difference:

Notice the mismatch in coordinates.

What is going on? It looks like the coordinates in the .txt files are based in the model resolution (which is 480x480 in my case). Also there seems to be some letter-boxing going on where the image is center-padded and this transformation is not being undone in the output to the .txt file.

Can you take a look at the source code?

Please note that this issue seems specific to yolo_v4. It does not occur when using detectnet_v2.

Sorry for late reply. How about tao yolo_v4 inference ?

UPDATE 22/JUN/2023 - Below test is wrong (see next post)

Thank you for the reply, I have tested it:

!tao yolo_v4 inference \
 --gpu_index=0 \
 --threshold 0.2 \
 -i /workspace/tao-experiments/test_part/image \
 -e $YOLO4_BASEDIR/yolo_v4_train_resnet18_kitti.txt \
 -m $YOLO4_BASEDIR/yolo4_rtx5000_fp32_bs1.engine \
 -o $YOLO4_BASEDIR/inference_tao/images_annotated \
 -l $YOLO4_BASEDIR/inference_tao/labels

The kitti-labels output is definitely better using tao instead of tao-deploy:

As you can see using tao the labels are a better match for the coordinates measured on the image. However, there is still a difference that cannot be explained by rounding:

In this image you see in yellow the bounding box plotted by tao inference. I also plotted the detection based on the kitty .txt label. You can see the difference.

Strangely this difference is worse on some images like this one:

Findings / conclusion
I was able to transform the wrong kitty labels output from tao-deploy so that they perfectly align with the labels plotted on the image using this transformation (will post code).

However, I am not sure what to do with the output from tao, it is much better, but why the difference?

Could you share the training spec file? Need to check the input width and height.
More, your test images have different kinds of resolution, right?

UPDATE 22/JUN/2023
There was a bug in my evaluation code. The output of tao inference is correct. The images_annotated align with the output kitty .txt labels.

Conclusion - use tao inference and don’t use tao-deploy inference.

Below shows the result with perfect alignment of the annotation (yellow) and the plotted label from the kitty .txt.


Old answer (wrong)
Below is the training spec file. I use 480x480 model resolution. The images are from the COCO dataset which have different resolutions (max_width=640, max_height=480, keep_aspect=True).

What I don’t understand is why annotated image seems to show the correct bounding box that the model predicted while the exported kitty .txt label is different. Both are the same thing, the predictions. Regardless of any config, the output annotated_image and its kitty-.txt should be consistent.

random_seed: 42
yolov4_config {
  big_anchor_shape: "[(158.62, 354.31),(249.36, 231.91),(294.31, 383.34)]"
  mid_anchor_shape: "[(58.12, 193.38),(149.85, 170.82),(96.37, 272.58)]"
  small_anchor_shape: "[(39.01, 111.43),(66.44, 66.55),(90.07, 123.66)]"
  box_matching_iou: 0.25
  matching_neutral_box_iou: 0.5
  arch: "resnet"
  nlayers: 18
  arch_conv_blocks: 2
  loss_loc_weight: 1.0
  loss_neg_obj_weights: 1.0
  loss_class_weights: 1.0
  label_smoothing: 0.0
  big_grid_xy_extend: 0.05
  mid_grid_xy_extend: 0.1
  small_grid_xy_extend: 0.2
  freeze_bn: false
  #freeze_blocks: 0
  force_relu: false
training_config {
  batch_size_per_gpu: 1
  num_epochs: 80
  enable_qat: false
  checkpoint_interval: 1
  learning_rate {
    soft_start_cosine_annealing_schedule {
      min_learning_rate: 1e-7
      max_learning_rate: 1e-4
      soft_start: 0.3
  regularizer {
    type: L1
    weight: 3e-5
  optimizer {
    adam {
      epsilon: 1e-7
      beta1: 0.9
      beta2: 0.999
      amsgrad: false
  visualizer {
   enabled: true
    num_images: 1
  early_stopping {
    monitor: "loss"
    patience: 10
  #pretrain_model_path: "/workspace/yolo_v4/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5"
  resume_model_path: "/workspace/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_040.tlt"
eval_config {
  average_precision_mode: INTEGRATE
  batch_size: 1
  matching_iou_threshold: 0.45
  visualize_pr_curve: true
nms_config {
  confidence_threshold: 0.2
  clustering_iou_threshold: 0.45
  force_on_cpu: true
  top_k: 200
augmentation_config {
  hue: 0.1
  saturation: 1.5
  horizontal_flip: 0.5
  jitter: 0.3
  output_width: 480
  output_height: 480
  output_channel: 3
  randomize_input_shape_period: 0
  mosaic_prob: 0.5
dataset_config {
  data_sources: {
      tfrecords_path: "/workspace/yolo_v4/train/tfrecords/*"
      image_directory_path: "/workspace/yolo_v4/train"
  include_difficult_in_training: true
  image_extension: "jpg"
  #image_extension: "png"
  target_class_mapping {
    key: "bicycle"
    value: "bicycle"
  target_class_mapping {
    key: "car"
    value: "car"
  target_class_mapping {
    key: "motorbike"
    value: "motorbike"
  target_class_mapping {
    key: "person"
    value: "person"
  target_class_mapping {
    key: "truck"
    value: "truck"
  validation_data_sources {
    tfrecords_path: "/workspace/yolo_v4/val/tfrecords/*"
    image_directory_path: "/workspace/yolo_v4/val/"

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.