Tao training - Visualise inference after training provides 98% accuracy, however, after model export to TensorRT, the inference result is 0%

Our Yolov4 model is pre-trained on CSPDarknet53. The training went well and we received mAP of 98% at the end with 10k images.

We’ve trained Yolov4 model in the same manner in the past using Tao Toolkit without any issues. However, with this one, it is all good upto the point to Visualise Inference. As soon as we convert teh model to TensorRT engine and run inference test, there is 0 detection.

We intend to use this model as SGIE using the back-to-back detectors example. We tested the model as PGIE and SGIE both, when that failed. We thought it may be due to some issues in conversion when we converted using our Xavier device. So, we used tao toolkit jupyter notebook to generate an engine file on the server and yet the same problem there. So it appears that we are doing something wrong at step 10. Model Export.

Herewith I have included our config file:

random_seed: 42
yolov4_config {
  big_anchor_shape: "[(67.00, 63.00), (91.00, 82.00), (130.00, 118.00)]"
  mid_anchor_shape: "[(33.00, 31.00), (41.00, 40.00), (54.00, 49.00)]"
  small_anchor_shape: "[(12.00, 13.00), (18.00, 18.00), (25.00, 25.00)]"
  box_matching_iou: 0.25
  matching_neutral_box_iou: 0.5
  arch: "cspdarknet"
  nlayers: 53
  arch_conv_blocks: 2
  loss_loc_weight: 1.0
  loss_neg_obj_weights: 1.0
  loss_class_weights: 1.0
  label_smoothing: 0.1
  big_grid_xy_extend: 0.05
  mid_grid_xy_extend: 0.1
  small_grid_xy_extend: 0.2
  freeze_bn: false
  #freeze_blocks: 0
  force_relu: false
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 120
  enable_qat: false
  checkpoint_interval: 10
  learning_rate {
    soft_start_cosine_annealing_schedule {
      min_learning_rate: 1e-7
      max_learning_rate: 1e-4
      soft_start: 0.3
    }
  }
  regularizer {
    type: L1
    weight: 3e-6
  }
  optimizer {
    adam {
      epsilon: 1e-7
      beta1: 0.9
      beta2: 0.999
      amsgrad: false
    }
  }
  pruned_model_path: "/workspace/tao-experiments/yolo_v4/experiment_dir_pruned/yolov4_cspdarknet53_pruned.tlt"
}
eval_config {
  average_precision_mode: INTEGRATE
  batch_size: 8
  matching_iou_threshold: 0.5
}
nms_config {
  confidence_threshold: 0.005
  clustering_iou_threshold: 0.5
  top_k: 200
}
augmentation_config {
  hue: 0.1
  saturation: 1.5
  exposure:1.5
  vertical_flip:0
  horizontal_flip: 0.5
  jitter: 0.3
  output_width: 512
  output_height: 352
  output_channel: 3
  randomize_input_shape_period: 10
  mosaic_prob: 0.5
  mosaic_min_ratio:0.2
}
dataset_config {
  data_sources: {
      tfrecords_path: "/workspace/tao-experiments/data/training/tfrecords/train*"
      image_directory_path: "/workspace/tao-experiments/data/training"
  }
  include_difficult_in_training: true
  image_extension: "png"
  target_class_mapping {
      key: "ear"
      value: "ear"
  }
  target_class_mapping {
      key: "eye"
      value: "eye"
  }
  target_class_mapping {
      key: "horn"
      value: "horn"
  }
  target_class_mapping {
      key: "nose"
      value: "nose"
  }
  validation_data_sources: {
      tfrecords_path: "/workspace/tao-experiments/data/val/tfrecords/val*"
      image_directory_path: "/workspace/tao-experiments/data/val"
  }
}

and here is the model export code:

!tao yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_cspdarknet53_epoch_$EPOCH.tlt  \
                     -o $USER_EXPERIMENT_DIR/export/yolov4_cspdarknet53_epoch_$EPOCH.etlt \
                     -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                     -k $KEY \
                     --cal_image_dir  $USER_EXPERIMENT_DIR/data/testing/image_2 \
                     --data_type int8 \
                     --batch_size 16 \
                     --batches 10 \
                     --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
                     --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile

Please use training images in the “--cal_image_dir” .

It has produced the same result. The model is generated but the inference result is zero. Herewith I have included retraining log followed by the Model export log:

Retraining result:

epoch,AP_ear,AP_eye,AP_horn,AP_nose,loss,lr,mAP,validation_loss
1,nan,nan,nan,nan,66.43994,4.4799996e-05,nan,nan
2,nan,nan,nan,nan,61.093052,8.9199995e-05,nan,nan
3,nan,nan,nan,nan,59.752018,0.00013359998,nan,nan
4,nan,nan,nan,nan,59.622494,0.00017799999,nan,nan
5,nan,nan,nan,nan,59.923515,0.00022239999,nan,nan
6,nan,nan,nan,nan,61.727955,0.0002668,nan,nan
7,nan,nan,nan,nan,62.574867,0.00031119998,nan,nan
8,nan,nan,nan,nan,63.800163,0.00035559997,nan,nan
9,nan,nan,nan,nan,65.24933,0.00039999996,nan,nan
10,0.9696846530051635,0.9556112852087715,0.9599799102309143,0.9888244377668178,66.879715,0.00039890545,0.9685250715529168,32.295438995657044
11,nan,nan,nan,nan,65.9971,0.0003956339,nan,nan
12,nan,nan,nan,nan,66.920975,0.00039022107,nan,nan
13,nan,nan,nan,nan,65.28394,0.00038272637,nan,nan
14,nan,nan,nan,nan,65.12993,0.00037323186,nan,nan
15,nan,nan,nan,nan,65.106804,0.00036184158,nan,nan
16,nan,nan,nan,nan,64.46103,0.00034868033,nan,nan
17,nan,nan,nan,nan,63.460102,0.0003338923,nan,nan
18,nan,nan,nan,nan,62.281967,0.0003176395,nan,nan
19,nan,nan,nan,nan,62.55332,0.0003001,nan,nan
20,0.9852715489879238,0.9701419409850334,0.9829024320295642,0.9916476833281718,61.97091,0.00028146597,0.9824909013326732,29.51218525199003
21,nan,nan,nan,nan,61.486893,0.0002619416,nan,nan
22,nan,nan,nan,nan,61.17372,0.00024174075,nan,nan
23,nan,nan,nan,nan,59.919712,0.00022108479,nan,nan
24,nan,nan,nan,nan,59.418755,0.0002002,nan,nan
25,nan,nan,nan,nan,58.460793,0.00017931522,nan,nan
26,nan,nan,nan,nan,58.512672,0.00015865924,nan,nan
27,nan,nan,nan,nan,57.209843,0.0001384584,nan,nan
28,nan,nan,nan,nan,57.177284,0.000118934026,nan,nan
29,nan,nan,nan,nan,56.11211,0.00010030001,nan,nan
30,0.9921042377047276,0.9811760497695594,0.9858362450658296,0.9972089116136724,55.949642,8.276051e-05,0.9890813610384472,27.40327364160109
env: EPOCH=030

Here is the model export log:

2022-03-12 00:22:23,596 [INFO] root: Registry: ['nvcr.io']
2022-03-12 00:22:23,673 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-03-12 00:22:23,687 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2022-03-12 00:22:31,635 [INFO] root: Building exporter object.
2022-03-12 00:22:45,024 [INFO] root: Exporting the model.
2022-03-12 00:22:45,024 [INFO] root: Using input nodes: ['Input']
2022-03-12 00:22:45,024 [INFO] root: Using output nodes: ['BatchedNMS']
2022-03-12 00:22:45,024 [INFO] iva.common.export.keras_exporter: Using input nodes: ['Input']
2022-03-12 00:22:45,025 [INFO] iva.common.export.keras_exporter: Using output nodes: ['BatchedNMS']
The ONNX operator number change on the optimization: 771 -> 363
2022-03-12 00:25:21,932 [INFO] keras2onnx: The ONNX operator number change on the optimization: 771 -> 363
2022-03-12 00:25:24,489 [INFO] iva.common.export.base_exporter: Generating a tensorfile with random tensor images. This may work well as a profiling tool, however, it may result in inaccurate results at inference. Please generate a tensorfile using the tlt-int8-tensorfile, or provide a custom directory of images for best performance.
100%|███████████████████████████████████████████| 10/10 [00:24<00:00,  2.50s/it]
2022-03-12 00:25:49,488 [INFO] iva.common.export.keras_exporter: Calibration takes time especially if number of batches is large.
2022-03-12 00:25:49,489 [INFO] root: Calibration takes time especially if number of batches is large.
2022-03-12 00:26:51,733 [INFO] iva.common.export.base_calibrator: Saving calibration cache (size 11319) to /workspace/tao-experiments/yolo_v4/export/cal.bin
2022-03-12 00:28:59,469 [INFO] root: Export complete.
2022-03-12 00:28:59,470 [INFO] root: {
    "param_count": 45.272193,
    "size": 172.74199676513672
}
2022-03-12 00:29:01,952 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Here is TRT Engine generation log:

2022-03-12 00:29:30,529 [INFO] root: Registry: ['nvcr.io']
2022-03-12 00:29:30,607 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-03-12 00:29:30,622 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[INFO] [MemUsageChange] Init CUDA: CPU +253, GPU +0, now: CPU 259, GPU 482 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/filejpN6xE
[INFO] ONNX IR version:  0.0.7
[INFO] Opset version:    13
[INFO] Producer name:    
[INFO] Producer version: 
[INFO] Domain:           
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[INFO] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[INFO] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[INFO] Successfully created plugin: BatchedNMSDynamic_TRT
[INFO] Detected input dimensions from the model: (-1, 3, 352, 512)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 352, 512) for input: Input
[INFO] Using optimization profile opt shape: (8, 3, 352, 512) for input: Input
[INFO] Using optimization profile max shape: (16, 3, 352, 512) for input: Input
[INFO] [MemUsageSnapshot] Builder begin: CPU 432 MiB, GPU 482 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 283) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 287) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 292) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 393) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 397) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 402) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 491) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 494) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 498) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 731) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor (Unnamed Layer* 735) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +350, GPU +160, now: CPU 788, GPU 642 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +274, GPU +132, now: CPU 1062, GPU 774 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 4 output network tensors.
[INFO] Total Host Persistent Memory: 347008
[INFO] Total Device Persistent Memory: 93597696
[INFO] Total Scratch Memory: 15817472
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 93 MiB, GPU 4 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1526, GPU 1012 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1526, GPU 1022 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1526, GPU 1006 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1525, GPU 988 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 1519 MiB, GPU 988 MiB
2022-03-12 00:37:34,686 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Above log shows that “it may result in inaccurate results at inference.”. Could you use all the training images to run exporting.
Please share the full command and log when you have done. Thanks.

We had provided with training images folder path in that previous training. However, from the snippet you pointed out in our log, I now have my doubts. We will run it again and let you know.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.