Tao training - Visualise inference after training provides 98% accuracy, however, after model export to TensorRT, the inference result is 0%

Riddhi · March 11, 2022, 11:54am

Our Yolov4 model is pre-trained on CSPDarknet53. The training went well and we received mAP of 98% at the end with 10k images.

We’ve trained Yolov4 model in the same manner in the past using Tao Toolkit without any issues. However, with this one, it is all good upto the point to Visualise Inference. As soon as we convert teh model to TensorRT engine and run inference test, there is 0 detection.

We intend to use this model as SGIE using the back-to-back detectors example. We tested the model as PGIE and SGIE both, when that failed. We thought it may be due to some issues in conversion when we converted using our Xavier device. So, we used tao toolkit jupyter notebook to generate an engine file on the server and yet the same problem there. So it appears that we are doing something wrong at step 10. Model Export.

Herewith I have included our config file:

random_seed: 42
yolov4_config {
  big_anchor_shape: "[(67.00, 63.00), (91.00, 82.00), (130.00, 118.00)]"
  mid_anchor_shape: "[(33.00, 31.00), (41.00, 40.00), (54.00, 49.00)]"
  small_anchor_shape: "[(12.00, 13.00), (18.00, 18.00), (25.00, 25.00)]"
  box_matching_iou: 0.25
  matching_neutral_box_iou: 0.5
  arch: "cspdarknet"
  nlayers: 53
  arch_conv_blocks: 2
  loss_loc_weight: 1.0
  loss_neg_obj_weights: 1.0
  loss_class_weights: 1.0
  label_smoothing: 0.1
  big_grid_xy_extend: 0.05
  mid_grid_xy_extend: 0.1
  small_grid_xy_extend: 0.2
  freeze_bn: false
  #freeze_blocks: 0
  force_relu: false
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 120
  enable_qat: false
  checkpoint_interval: 10
  learning_rate {
    soft_start_cosine_annealing_schedule {
      min_learning_rate: 1e-7
      max_learning_rate: 1e-4
      soft_start: 0.3
    }
  }
  regularizer {
    type: L1
    weight: 3e-6
  }
  optimizer {
    adam {
      epsilon: 1e-7
      beta1: 0.9
      beta2: 0.999
      amsgrad: false
    }
  }
  pruned_model_path: "/workspace/tao-experiments/yolo_v4/experiment_dir_pruned/yolov4_cspdarknet53_pruned.tlt"
}
eval_config {
  average_precision_mode: INTEGRATE
  batch_size: 8
  matching_iou_threshold: 0.5
}
nms_config {
  confidence_threshold: 0.005
  clustering_iou_threshold: 0.5
  top_k: 200
}
augmentation_config {
  hue: 0.1
  saturation: 1.5
  exposure:1.5
  vertical_flip:0
  horizontal_flip: 0.5
  jitter: 0.3
  output_width: 512
  output_height: 352
  output_channel: 3
  randomize_input_shape_period: 10
  mosaic_prob: 0.5
  mosaic_min_ratio:0.2
}
dataset_config {
  data_sources: {
      tfrecords_path: "/workspace/tao-experiments/data/training/tfrecords/train*"
      image_directory_path: "/workspace/tao-experiments/data/training"
  }
  include_difficult_in_training: true
  image_extension: "png"
  target_class_mapping {
      key: "ear"
      value: "ear"
  }
  target_class_mapping {
      key: "eye"
      value: "eye"
  }
  target_class_mapping {
      key: "horn"
      value: "horn"
  }
  target_class_mapping {
      key: "nose"
      value: "nose"
  }
  validation_data_sources: {
      tfrecords_path: "/workspace/tao-experiments/data/val/tfrecords/val*"
      image_directory_path: "/workspace/tao-experiments/data/val"
  }
}

and here is the model export code:

!tao yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_cspdarknet53_epoch_$EPOCH.tlt  \
                     -o $USER_EXPERIMENT_DIR/export/yolov4_cspdarknet53_epoch_$EPOCH.etlt \
                     -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                     -k $KEY \
                     --cal_image_dir  $USER_EXPERIMENT_DIR/data/testing/image_2 \
                     --data_type int8 \
                     --batch_size 16 \
                     --batches 10 \
                     --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
                     --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile

Morganh · March 11, 2022, 2:20pm

Please use training images in the “--cal_image_dir” .

Riddhi · March 12, 2022, 12:53am

It has produced the same result. The model is generated but the inference result is zero. Herewith I have included retraining log followed by the Model export log:

Retraining result:

epoch,AP_ear,AP_eye,AP_horn,AP_nose,loss,lr,mAP,validation_loss
1,nan,nan,nan,nan,66.43994,4.4799996e-05,nan,nan
2,nan,nan,nan,nan,61.093052,8.9199995e-05,nan,nan
3,nan,nan,nan,nan,59.752018,0.00013359998,nan,nan
4,nan,nan,nan,nan,59.622494,0.00017799999,nan,nan
5,nan,nan,nan,nan,59.923515,0.00022239999,nan,nan
6,nan,nan,nan,nan,61.727955,0.0002668,nan,nan
7,nan,nan,nan,nan,62.574867,0.00031119998,nan,nan
8,nan,nan,nan,nan,63.800163,0.00035559997,nan,nan
9,nan,nan,nan,nan,65.24933,0.00039999996,nan,nan
10,0.9696846530051635,0.9556112852087715,0.9599799102309143,0.9888244377668178,66.879715,0.00039890545,0.9685250715529168,32.295438995657044
11,nan,nan,nan,nan,65.9971,0.0003956339,nan,nan
12,nan,nan,nan,nan,66.920975,0.00039022107,nan,nan
13,nan,nan,nan,nan,65.28394,0.00038272637,nan,nan
14,nan,nan,nan,nan,65.12993,0.00037323186,nan,nan
15,nan,nan,nan,nan,65.106804,0.00036184158,nan,nan
16,nan,nan,nan,nan,64.46103,0.00034868033,nan,nan
17,nan,nan,nan,nan,63.460102,0.0003338923,nan,nan
18,nan,nan,nan,nan,62.281967,0.0003176395,nan,nan
19,nan,nan,nan,nan,62.55332,0.0003001,nan,nan
20,0.9852715489879238,0.9701419409850334,0.9829024320295642,0.9916476833281718,61.97091,0.00028146597,0.9824909013326732,29.51218525199003
21,nan,nan,nan,nan,61.486893,0.0002619416,nan,nan
22,nan,nan,nan,nan,61.17372,0.00024174075,nan,nan
23,nan,nan,nan,nan,59.919712,0.00022108479,nan,nan
24,nan,nan,nan,nan,59.418755,0.0002002,nan,nan
25,nan,nan,nan,nan,58.460793,0.00017931522,nan,nan
26,nan,nan,nan,nan,58.512672,0.00015865924,nan,nan
27,nan,nan,nan,nan,57.209843,0.0001384584,nan,nan
28,nan,nan,nan,nan,57.177284,0.000118934026,nan,nan
29,nan,nan,nan,nan,56.11211,0.00010030001,nan,nan
30,0.9921042377047276,0.9811760497695594,0.9858362450658296,0.9972089116136724,55.949642,8.276051e-05,0.9890813610384472,27.40327364160109
env: EPOCH=030

Here is the model export log:

2022-03-12 00:22:23,596 [INFO] root: Registry: ['nvcr.io']
2022-03-12 00:22:23,673 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-03-12 00:22:23,687 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2022-03-12 00:22:31,635 [INFO] root: Building exporter object.
2022-03-12 00:22:45,024 [INFO] root: Exporting the model.
2022-03-12 00:22:45,024 [INFO] root: Using input nodes: ['Input']
2022-03-12 00:22:45,024 [INFO] root: Using output nodes: ['BatchedNMS']
2022-03-12 00:22:45,024 [INFO] iva.common.export.keras_exporter: Using input nodes: ['Input']
2022-03-12 00:22:45,025 [INFO] iva.common.export.keras_exporter: Using output nodes: ['BatchedNMS']
The ONNX operator number change on the optimization: 771 -> 363
2022-03-12 00:25:21,932 [INFO] keras2onnx: The ONNX operator number change on the optimization: 771 -> 363
2022-03-12 00:25:24,489 [INFO] iva.common.export.base_exporter: Generating a tensorfile with random tensor images. This may work well as a profiling tool, however, it may result in inaccurate results at inference. Please generate a tensorfile using the tlt-int8-tensorfile, or provide a custom directory of images for best performance.
100%|███████████████████████████████████████████| 10/10 [00:24<00:00,  2.50s/it]
2022-03-12 00:25:49,488 [INFO] iva.common.export.keras_exporter: Calibration takes time especially if number of batches is large.
2022-03-12 00:25:49,489 [INFO] root: Calibration takes time especially if number of batches is large.
2022-03-12 00:26:51,733 [INFO] iva.common.export.base_calibrator: Saving calibration cache (size 11319) to /workspace/tao-experiments/yolo_v4/export/cal.bin
2022-03-12 00:28:59,469 [INFO] root: Export complete.
2022-03-12 00:28:59,470 [INFO] root: {
    "param_count": 45.272193,
    "size": 172.74199676513672
}
2022-03-12 00:29:01,952 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Here is TRT Engine generation log:

2022-03-12 00:29:30,529 [INFO] root: Registry: ['nvcr.io']
2022-03-12 00:29:30,607 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-03-12 00:29:30,622 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[INFO] [MemUsageChange] Init CUDA: CPU +253, GPU +0, now: CPU 259, GPU 482 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/filejpN6xE
[INFO] ONNX IR version:  0.0.7
[INFO] Opset version:    13
[INFO] Producer name:    
[INFO] Producer version: 
[INFO] Domain:           
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[INFO] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[INFO] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[INFO] Successfully created plugin: BatchedNMSDynamic_TRT
[INFO] Detected input dimensions from the model: (-1, 3, 352, 512)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 352, 512) for input: Input
[INFO] Using optimization profile opt shape: (8, 3, 352, 512) for input: Input
[INFO] Using optimization profile max shape: (16, 3, 352, 512) for input: Input
[INFO] [MemUsageSnapshot] Builder begin: CPU 432 MiB, GPU 482 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 283) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 287) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 292) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 393) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 397) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 402) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 491) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 494) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 498) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 731) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 735) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +350, GPU +160, now: CPU 788, GPU 642 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +274, GPU +132, now: CPU 1062, GPU 774 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 4 output network tensors.
[INFO] Total Host Persistent Memory: 347008
[INFO] Total Device Persistent Memory: 93597696
[INFO] Total Scratch Memory: 15817472
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 93 MiB, GPU 4 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1526, GPU 1012 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1526, GPU 1022 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1526, GPU 1006 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1525, GPU 988 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 1519 MiB, GPU 988 MiB
2022-03-12 00:37:34,686 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · March 12, 2022, 4:17am

Above log shows that “it may result in inaccurate results at inference.”. Could you use all the training images to run exporting.
Please share the full command and log when you have done. Thanks.

Riddhi · March 12, 2022, 6:49am

We had provided with training images folder path in that previous training. However, from the snippet you pointed out in our log, I now have my doubts. We will run it again and let you know.

system · March 29, 2022, 2:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tao-converter doesn't work for Deepstream 6.1 TAO Toolkit	7	754	July 14, 2022
How to generate the correct engine with tensorrt for Yolov3 TAO Toolkit	8	1011	July 22, 2023
Deepstream infrence gives no detection TAO Toolkit	28	1932	December 9, 2021
Error in integrating Yolov4 in Deepstream 6, 6.1, 6.1.1, and 6.2 TAO Toolkit	14	856	March 21, 2023
TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP) TAO Toolkit yolo	35	3821	December 6, 2021
Convert model to Jetson Error during model export step in TAO notebook TAO Toolkit	21	2041	February 15, 2022
Error when generating engine file from a TAO trained yolov4_tiny model in Deepstream 6.1.1 DeepStream SDK	11	392	June 12, 2023
Using a onnx model in INT8 mode for jetson Orin AGX TAO Toolkit yolo , onnx , jetson , deepstream	15	900	May 21, 2024
[ERROR] Model has dynamic shape but no optimization profile specified. Aborted (core dumped) TAO Toolkit	30	2021	December 13, 2021
How to generate TRT engine from TAO on Triton-Server (TensorRT incompatible) TAO Toolkit	3	975	July 6, 2023

Tao training - Visualise inference after training provides 98% accuracy, however, after model export to TensorRT, the inference result is 0%

Related topics