How to generate the correct engine with tensorrt for Yolov3

Description

When using deepstream with yolov3, after I’ve compiled the etlt model in int8, my yolov3 results is pretty reandom.

Environment

TensorRT Version: 8.5.2
GPU Type: AGX xavier
Nvidia Driver Version:
CUDA Version: 11.4
CUDNN Version: 5.1.1
Jetpack 5.1.1

Relevant Files

the files I use

Steps To Reproduce

Greetings everyone,

I wanted to share my experience with training a YOLOv3 model using the Tao toolkit. After successfully exporting it to an etlt model, I proceeded to use tao-deploy to generate the cal.bin file for int8 inference. While the model performed well in FP32 and FP16, I encountered significant issues with inference accuracy when using int8.

Specifically, during int8 inference, I noticed that no detections have a confidence score higher than 0.35. Lowering the confidence threshold results in what appears to be mostly noise or false detections.

I’ve attempted to troubleshoot the problem by exploring different approaches, such as using Tao-converter on my Jetson, but I’ve been facing challenges with setting the correct parameters, specifically the -p parameter.

I suspect that the issue lies with the generated calibration file, but despite investing several days into this, I’m feeling quite lost and unsure of what else to try. Any insights or suggestions would be greatly appreciated. Thank you in advance for your help!

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hello,

as stated and as you can see in the shared files the model I use is not an onnx model but an etlt from Tao toolkit.

Thanks for your answer

Hi,

We are moving this post to the Tao toolkit forum to get better help.

Thank you.

How did you generate the cal.bin? Did you ever keep the log when you run the command?
Suggest you to use entire training images when run the command.
For example, if your training images have totally 1000 images, then you can set
--batches 1000
--batch_size 1

Yeah I did that here is the log :

======================
=== TAO Toolkit Deploy ===

NVIDIA Release 4.0.0-Deploy (build 47705558)
TAO Toolkit Version 4.0.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:

INFO:root:The provided .etlt file is in ONNX format.
[07/21/2023-12:32:05] [TRT] [I] [MemUsageChange] Init CUDA: CPU +318, GPU +0, now: CPU 351, GPU 992 (MiB)
[07/21/2023-12:32:06] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +443, GPU +116, now: CPU 848, GPU 1108 (MiB)
[07/21/2023-12:32:06] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in CUDA C++ Programming Guide
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:Parsing ONNX model
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:List inputs:
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:Input 0 → Input.
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:[3, 1280, 1280].
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:0.
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[07/21/2023-12:32:06] [TRT] [I] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[07/21/2023-12:32:06] [TRT] [I] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[07/21/2023-12:32:06] [TRT] [W] parsers/onnx/builtin_op_importers.cpp:5225: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[07/21/2023-12:32:06] [TRT] [I] Successfully created plugin: BatchedNMSDynamic_TRT
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:Network Description
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:Input ‘Input’ with shape (-1, 3, 1280, 1280) and dtype DataType.FLOAT
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:Output ‘BatchedNMS’ with shape (-1, 1) and dtype DataType.INT32
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:Output ‘BatchedNMS_1’ with shape (-1, 200, 4) and dtype DataType.FLOAT
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:Output ‘BatchedNMS_2’ with shape (-1, 200) and dtype DataType.FLOAT
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:Output ‘BatchedNMS_3’ with shape (-1, 200) and dtype DataType.FLOAT
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:dynamic batch size handling
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:Calibrating using ImageBatcher
INFO:nvidia_tao_deploy.cv.yolo_v3.engine_builder:[1, 3, 1280, 1280]
[07/21/2023-12:32:07] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +854, GPU +362, now: CPU 1763, GPU 1490 (MiB)
[07/21/2023-12:32:07] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1889, GPU 1548 (MiB)
[07/21/2023-12:32:07] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed.
[07/21/2023-12:32:08] [TRT] [I] Total Activation Memory: 2956434432
[07/21/2023-12:32:08] [TRT] [I] Detected 1 inputs and 4 output network tensors.
[07/21/2023-12:32:08] [TRT] [I] Total Host Persistent Memory: 112528
[07/21/2023-12:32:08] [TRT] [I] Total Device Persistent Memory: 0
[07/21/2023-12:32:08] [TRT] [I] Total Scratch Memory: 16147968
[07/21/2023-12:32:08] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 8 MiB, GPU 60 MiB
[07/21/2023-12:32:08] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 152 steps to complete.
[07/21/2023-12:32:08] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 3.90092ms to assign 14 blocks to 152 nodes requiring 85787648 bytes.
[07/21/2023-12:32:08] [TRT] [I] Total Activation Memory: 85787648
[07/21/2023-12:32:08] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2467, GPU 1860 (MiB)
[07/21/2023-12:32:08] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 2468, GPU 1870 (MiB)
[07/21/2023-12:32:08] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2467, GPU 1846 (MiB)
[07/21/2023-12:32:08] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2467, GPU 1854 (MiB)
[07/21/2023-12:32:08] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +81, now: CPU 0, GPU 141 (MiB)
[07/21/2023-12:32:08] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in CUDA C++ Programming Guide
[07/21/2023-12:32:08] [TRT] [I] Starting Calibration.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 1 / 7139
[07/21/2023-12:32:08] [TRT] [I] Calibrated batch 0 in 0.0707083 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 2 / 7139
[07/21/2023-12:32:08] [TRT] [I] Calibrated batch 1 in 0.0706296 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 3 / 7139
[07/21/2023-12:32:08] [TRT] [I] Calibrated batch 2 in 0.0706181 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 4 / 7139
[07/21/2023-12:32:08] [TRT] [I] Calibrated batch 3 in 0.0702139 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 5 / 7139
[07/21/2023-12:32:08] [TRT] [I] Calibrated batch 4 in 0.0701423 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 6 / 7139
[07/21/2023-12:32:08] [TRT] [I] Calibrated batch 5 in 0.0710811 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 7 / 7139
[07/21/2023-12:32:08] [TRT] [I] Calibrated batch 6 in 0.0711559 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 8 / 7139
[07/21/2023-12:32:08] [TRT] [I] Calibrated batch 7 in 0.0705914 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 9 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 8 in 0.0705384 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 10 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 9 in 0.0702 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 11 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 10 in 0.0703424 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 12 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 11 in 0.0700974 seconds.

INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 13 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 12 in 0.0711241 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 14 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 13 in 0.0702954 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 15 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 14 in 0.0720497 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 16 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 15 in 0.0711025 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 17 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 16 in 0.0714033 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 18 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 17 in 0.0718931 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 19 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 18 in 0.0700664 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 20 / 7139
[07/21/2023-12:32:09] [TRT] [I] Calibrated batch 19 in 0.0716921 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 21 / 7139
[07/21/2023-12:32:10] [TRT] [I] Calibrated batch 20 in 0.0725608 seconds.

[…]

INFO:nvidia_tao_deploy.engine.calibrator:Calibrating image 7139 / 7139
[07/21/2023-12:42:25] [TRT] [I] Calibrated batch 7138 in 0.0701983 seconds.
INFO:nvidia_tao_deploy.engine.calibrator:Finished calibration batches
[07/21/2023-12:42:31] [TRT] [I] Post Processing Calibration data in 6.0277 seconds.
[07/21/2023-12:42:31] [TRT] [I] Calibration completed in 624.323 seconds.
[07/21/2023-12:42:31] [TRT] [I] Writing Calibration Cache for calibrator: TRT-8501-EntropyCalibration2
INFO:nvidia_tao_deploy.engine.calibrator:Writing calibration cache data to: /workspace/yolo_v3/export/cal.bin
[07/21/2023-12:42:31] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 401) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[07/21/2023-12:42:31] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 405) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[07/21/2023-12:42:31] [TRT] [W] Missing scale and zero-point for tensor BatchedNMS, expect fall back to non-int8 implementation for any layer consuming or producing given tensor

[07/21/2023-12:42:31] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2898, GPU 1786 (MiB)
[07/21/2023-12:42:31] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2898, GPU 1794 (MiB)
[07/21/2023-12:42:31] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[07/21/2023-12:43:48] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[07/21/2023-12:47:50] [TRT] [I] Total Activation Memory: 2284631552
[07/21/2023-12:47:50] [TRT] [I] Detected 1 inputs and 4 output network tensors.
[07/21/2023-12:47:50] [TRT] [I] Total Host Persistent Memory: 140688
[07/21/2023-12:47:50] [TRT] [I] Total Device Persistent Memory: 0
[07/21/2023-12:47:50] [TRT] [I] Total Scratch Memory: 12922368
[07/21/2023-12:47:50] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 20 MiB, GPU 1113 MiB
[07/21/2023-12:47:50] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 70 steps to complete.
[07/21/2023-12:47:50] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 1.25284ms to assign 7 blocks to 70 nodes requiring 27386368 bytes.
[07/21/2023-12:47:50] [TRT] [I] Total Activation Memory: 27386368
[07/21/2023-12:47:51] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2938, GPU 1840 (MiB)
[07/21/2023-12:47:51] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2938, GPU 1848 (MiB)
[07/21/2023-12:47:51] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[07/21/2023-12:47:51] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[07/21/2023-12:47:51] [TRT] [W] Check verbose logs for the list of affected weights.
[07/21/2023-12:47:51] [TRT] [W] - 49 weights are affected by this issue: Detected subnormal FP16 values.
[07/21/2023-12:47:51] [TRT] [W] - 10 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[07/21/2023-12:47:51] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +14, GPU +21, now: CPU 14, GPU 21 (MiB)
INFO:root:Export finished successfully.
2023-07-21 12:47:51,621 [INFO] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: Sending telemetry data.
2023-07-21 12:47:52,199 [WARNING] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: Telemetry data couldn’t be sent, but the command ran successfully.
2023-07-21 12:47:52,199 [WARNING] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: [Error]: init() missing 4 required positional arguments: ‘code’, ‘msg’, ‘hdrs’, and ‘fp’
2023-07-21 12:47:52,199 [INFO] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto: Execution status: PASS

Comparing it with the calibration_file.json generated with the yolov3 export, I can see that the weight generated from deploy “cal.bin” file are mostly hexadecimal values close to 0:

for example, in the calibration_file json I get:

“Input”: 151.003662109375

and in the cal.bin file I get:

Input: 4000890a

(knowing that 0x4000890a =1.31927e-32)

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Please download and try to run official notebook to check if it works.

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/tao-getting-started/versions/4.0.2/zip -O getting_started_v4.0.2.zip
unzip -u getting_started_v4.0.2.zip -d ./getting_started_v4.0.2 && rm -rf getting_started_v4.0.2.zip && cd ./getting_started_v4.0.2

https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/tao-getting-started/version/4.0.2/files/notebooks/tao_launcher_starter_kit/yolo_v3/yolo_v3.ipynb

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.