DLA+INT8 compiled engine doesn't produce meaningful results

sarperyurttas36 · December 13, 2024, 12:05pm

Hi Everyone,

Hardware: Jetson Orin NX 16 GB, Jetpack 5.1.1.
TensorRT version: 8.5.2

I’m trying to utilize DLA in my deepstream application so compiled a DLA enabled TensorRT engine trtexec. When I compile the engine with DLA+IN8, I get meaningless results whilst DLA+FP16 or GPU+INT8 combinations working perfectly. If it’s fixed in next versions of TensorRT, I would still need a solution for this specific version.

To compile the engines:

GPU+INT8

trtexec --onnx=/workspace/engine-dev/yolov6n_bd_1088x1920.onnx \
        --shapes=input:2x3x1088x1920 \
        --saveEngine=model_gn.engine \
        --exportProfile=model_gn.json \
        --fp16 \
        --int8 \
        --calib=/workspace/engine-dev/yolov6n_bd_1088x1920.cache \
        --useSpinWait \
        --separateProfileRun > model_gn.log

DLA+INT8

trtexec --onnx=/workspace/engine-dev/yolov6n_bd_1088x1920.onnx \
        --shapes=input:2x3x1088x1920 \
        --saveEngine=model_gn.engine \
        --exportProfile=model_gn.json \
        --fp16 \
        --int8 \
        --calib=/workspace/engine-dev/yolov6n_bd_1088x1920.cache \
        --useDLACore=0 \
        --allowGPUFallback \
        --useSpinWait \
        --separateProfileRun > model_gn.log

DLA+FP16

trtexec --onnx=/workspace/engine-dev/yolov6n_bd_1088x1920.onnx \
        --shapes=input:2x3x1088x1920 \
        --saveEngine=model_gn.engine \
        --exportProfile=model_gn.json \
        --fp16 \
        --useDLACore=0 \
        --allowGPUFallback \
        --useSpinWait \
        --separateProfileRun > model_gn.log

Results I get:

GPU+INT8

DLA+INT8

DLA+FP16

AastaLLL · December 16, 2024, 3:01am

Hi,

We want to reproduce this issue in our environment to check it further.
Could you share the model and the python script you print out the result with us?

Thanks.

sarperyurttas36 · December 16, 2024, 10:05am

Hi AastaLLL,
Thanks for your prompt response. I’ve sent you the files to reproduce this issue.

Best regards

AastaLLL · December 18, 2024, 8:26am

Hi,

Thanks, we will give it a try and let you know the following.
Thanks.

AastaLLL · December 20, 2024, 3:54am

Hi,

We want to test this issue with our latest software.
Are you able to generate the calibration cache with JetPack 6.1 & TensorRT 10.3 and share it with us?

Thanks.

sarperyurttas36 · December 20, 2024, 12:20pm

Hi, I currently don’t have any devices with that version. I need to flash it first. I will let you know when it’s ready.

sarperyurttas36 · December 23, 2024, 12:59pm

Hi @AastaLLL,

Apparently, INT8 calibration is changed in TensorRT 10.3 so my scripts don’t work. I need to figure it out first.

This post has the same issue with me. They solved the issue by using a model optimizer however it doesn’t apply to my model (yolov6).

github.com/NVIDIA/TensorRT

INT8EntropyCalibrator2 implicit quantization superseded by explicit quantization

opened 11:56PM - 23 Aug 24 UTC

adaber

triaged

## Description Hi, I have been using the INT8 Entropy Calibrator 2 for INT…8 quantization in Python and it’s been working well (TensorRT 10.0.1). The example of how I use the INT8 Entropy Calibrator 2 can be found in the official TRT GitHub repo ([TensorRT/samples/python/efficientdet/build_engine.py at release/10.0 · NVIDIA/TensorRT · GitHub](https://github.com/NVIDIA/TensorRT/blob/release/10.0/samples/python/efficientdet/build_engine.py)) The warning I’ve been getting starting with TensorRT 10.1 is that the INT8 Entropy Calibrator 2 implicit quantization has been deprecated and superseded by explicit quantization. I’ve read the official document on the difference between the implicit and explicit quantization processes ([Developer Guide :: NVIDIA Deep Learning TensorRT Documentation](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#intro-quantization)) and they seem to work differently. The explicit quantization seems to expect a network to have QuantizeLayer and DequantizeLayer layers which my networks don’t. The implicit quantization can be used when those layers are not present in a network. Therefore, I am confused about how the implicit quantization can be superseded by the explicit quantization since they seem to work differently. So, my question is **what needs to be modified in the standard INT8 Calibrator 2 quantization method ([TensorRT/samples/python/efficientdet/build_engine.py at release/10.0 · NVIDIA/TensorRT · GitHub](https://github.com/NVIDIA/TensorRT/blob/release/10.0/samples/python/efficientdet/build_engine.py)) for the deprecation warning not to show up** ? Or **what is the proper way to implement the INT8 Calibrator 2 implicit quantization now that the current one is deprecated ?** Couldn’t find any example using a newer TensorRT version (10.1 and up) Thank you! ## Environment **TensorRT Version**: 10.1 **NVIDIA GPU**: 3090 **Operating System**: Windows 10 **Python Version**: 3.9.19

Is there documentation or guide to calibrate the models with the old way in TensorRT 10.3?

AastaLLL · December 25, 2024, 3:28am

Hi,

Did you try the PTQ quantization?

Suppose you can get the quantized ONNX model with the link above directly.
Thanks.

sarperyurttas36 · January 3, 2025, 1:57pm

Hi,

I’m not sure if it works with my model because I use YOLO v6 and in its documentation, they say PTQ degrades the model’s performance significantly. (refer to: YOLOv6/tools/partial_quantization at main · meituan/YOLOv6 · GitHub)

I will try to make it work and let you know but I’m currently having issues with building the modelopt docker image.

kayccc · January 15, 2025, 3:17am

Is this still an issue to support? Any result can be shared?

sarperyurttas36 · January 15, 2025, 8:21am

Hi,
Unfortunately, I couldn’t even reproduce the results without DLA or Quantization yet in version TensorRT 10.3 as a lot of things have been changed.

It takes some time as there are other ongoing projects that I work on.

AastaLLL · January 16, 2025, 5:48am

Hi,

Could you try to run the model with trtexec first?
The tool can quantize the model into INT8 automatically.

Although the accuracy might drop without the calibration file, it should give us some idea if this also occurs on TensorRT 10.
The outputs from FP16 (0.942…) are very different from the INT8 case (0.129…).

You can use --loadInputs to run the tool with predefined image data.

Thanks.

sarperyurttas36 · January 16, 2025, 11:28am

Hi AastaLLL,

Unfortunately, I still cannot obtain meaningful results with TensorRT 10.3 and FP16+GPU settings.

I run everything in nvcr.io/nvidia/l4t-jetpack:r36.4.0 docker container with NVIDIA Jetson AGX Orin - Jetpack 6.1 [L4T 36.4.0] device.

Image to tensor code:

import sys
import PIL.Image
import numpy as np

if __name__ == '__main__':
    input_path = sys.argv[1]
    batch_size = int(sys.argv[2])

    # Load and resize image
    im = PIL.Image.open(input_path).resize((1920, 1088))
    data = np.asarray(im, dtype=np.float32)
    
    # Add batch dimension and repeat the image batch_size times
    data = np.expand_dims(data, axis=0)  # Add batch dimension
    data = np.repeat(data, batch_size, axis=0)  # Repeat to create batch
    
    # Convert from NHWC to NCHW format
    data = np.transpose(data, (0, 3, 1, 2))
    
    print(f"Final shape (NCHW format): {data.shape}")
    data.tofile("input_tensor.dat")

Build command:

# GPU+FP16
/usr/src/tensorrt/bin/trtexec --onnx=/workspace/yolov6n_mp_one_v0_2_0_1920_bd_1088x1920.onnx \
        --shapes=input:2x3x1088x1920 \
        --saveEngine=yolov6n_mp_one_v0_2_0_1920_bd_1088x1920_fp16_gpu.engine \
        --fp16 \
        --useSpinWait \
        --separateProfileRun

Inference command:


/usr/src/tensorrt/bin/trtexec --loadEngine=/workspace/yolov6n_mp_one_v0_2_0_1920_bd_1088x1920_fp16_gpu.engine \
                              --fp16 \
                              --loadInputs='input:/workspace/data/input_tensor.dat' \
                              --useSpinWait \
                              --exportOutput='output.json' \
                              --separateProfileRun

Inference results:

Thousands of detections with scores higher than 0.99 which is quite unexpected. I don’t understand what is going on I must be doing something wrong or this model(yolov6n) doesn’t work with this version of TensorRT but how is it possible?

The same onnx file works perfectly with TensorRT version 8.5.2.2. I tried to export the weights(.pt) file to onnx in the same container but results are still not okay.

AastaLLL · January 20, 2025, 8:21am

Hi,

Thanks for your testing.

Are you able to run the YOLO model with ONNXRuntime on JetPack 6.1?
You can find the ORT package in the below link:
https://pypi.jetson-ai-lab.dev/jp6/cu126

If yes, could you share the source/data/model to inference it with ONNXRuntime.
We want to check why the model doesn’t work with TensorRT.

Thanks.

system · February 26, 2025, 12:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to build model engine for INT8 yolov8m quantized using tensorrt model optimizer TensorRT jetson , deepstream	5	444	September 24, 2024
TensorRT INT8 conversion fails with assertion error using Ultralytics Jetson AGX Orin tensorrt	8	106	April 25, 2025
Deploying YOLOv5 on NVIDIA Jetson Orin with cuDLA: Quantization-Aware Training to Inference Technical Blog	0	466	August 31, 2023
How can I customize matrix multiplication on DLA Jetson AGX Orin dla	12	233	September 25, 2024
TRT engine successful built on JetPack 5.0.1(trt 8.4.1) but not on JetPack 5.1.2(TensorRT 8.5.2) Jetson Xavier NX tensorrt , dla	13	914	September 25, 2023
INT8 Calibration with DS 6.3 worse than with DS 6.0 DeepStream SDK tensorrt , jetson , deepstream , tensorrt-model-optimizer	20	122	March 10, 2025
Clarification about dynamic layers support on DLA core (Jetson AGX Orin 64 GB) Jetson AGX Orin tensorrt , cuda , dla	7	92	June 24, 2025
DLA performance DeepStream SDK	17	174	September 23, 2024
INT8 calibration file not generating, not building in INT8 mode TensorRT tensorrt , ubuntu , python , jetson-nano	15	2470	June 4, 2022
Jetson AGX Orin，how to use DLA for yolov2_tiny Jetson AGX Orin dla	16	1935	April 27, 2023

DLA+INT8 compiled engine doesn't produce meaningful results

Image to tensor code:

Build command:

Inference command:

Inference results:

Related topics