DLA+INT8 compiled engine doesn't produce meaningful results

Hi Everyone,

Hardware: Jetson Orin NX 16 GB, Jetpack 5.1.1.
TensorRT version: 8.5.2

I’m trying to utilize DLA in my deepstream application so compiled a DLA enabled TensorRT engine trtexec. When I compile the engine with DLA+IN8, I get meaningless results whilst DLA+FP16 or GPU+INT8 combinations working perfectly. If it’s fixed in next versions of TensorRT, I would still need a solution for this specific version.

To compile the engines:

GPU+INT8

trtexec --onnx=/workspace/engine-dev/yolov6n_bd_1088x1920.onnx \
        --shapes=input:2x3x1088x1920 \
        --saveEngine=model_gn.engine \
        --exportProfile=model_gn.json \
        --fp16 \
        --int8 \
        --calib=/workspace/engine-dev/yolov6n_bd_1088x1920.cache \
        --useSpinWait \
        --separateProfileRun > model_gn.log

DLA+INT8

trtexec --onnx=/workspace/engine-dev/yolov6n_bd_1088x1920.onnx \
        --shapes=input:2x3x1088x1920 \
        --saveEngine=model_gn.engine \
        --exportProfile=model_gn.json \
        --fp16 \
        --int8 \
        --calib=/workspace/engine-dev/yolov6n_bd_1088x1920.cache \
        --useDLACore=0 \
        --allowGPUFallback \
        --useSpinWait \
        --separateProfileRun > model_gn.log

DLA+FP16

trtexec --onnx=/workspace/engine-dev/yolov6n_bd_1088x1920.onnx \
        --shapes=input:2x3x1088x1920 \
        --saveEngine=model_gn.engine \
        --exportProfile=model_gn.json \
        --fp16 \
        --useDLACore=0 \
        --allowGPUFallback \
        --useSpinWait \
        --separateProfileRun > model_gn.log

Results I get:

GPU+INT8

DLA+INT8

DLA+FP16

Hi,

We want to reproduce this issue in our environment to check it further.
Could you share the model and the python script you print out the result with us?

Thanks.

Hi AastaLLL,
Thanks for your prompt response. I’ve sent you the files to reproduce this issue.

Best regards

Hi,

Thanks, we will give it a try and let you know the following.
Thanks.

Hi,

We want to test this issue with our latest software.
Are you able to generate the calibration cache with JetPack 6.1 & TensorRT 10.3 and share it with us?

Thanks.

Hi, I currently don’t have any devices with that version. I need to flash it first. I will let you know when it’s ready.

Hi @AastaLLL,

Apparently, INT8 calibration is changed in TensorRT 10.3 so my scripts don’t work. I need to figure it out first.

This post has the same issue with me. They solved the issue by using a model optimizer however it doesn’t apply to my model (yolov6).

Is there documentation or guide to calibrate the models with the old way in TensorRT 10.3?

Hi,

Did you try the PTQ quantization?

Suppose you can get the quantized ONNX model with the link above directly.
Thanks.

Hi,

I’m not sure if it works with my model because I use YOLO v6 and in its documentation, they say PTQ degrades the model’s performance significantly. (refer to: YOLOv6/tools/partial_quantization at main · meituan/YOLOv6 · GitHub)

I will try to make it work and let you know but I’m currently having issues with building the modelopt docker image.

Is this still an issue to support? Any result can be shared?

Hi,
Unfortunately, I couldn’t even reproduce the results without DLA or Quantization yet in version TensorRT 10.3 as a lot of things have been changed.

It takes some time as there are other ongoing projects that I work on.

Hi,

Could you try to run the model with trtexec first?
The tool can quantize the model into INT8 automatically.

Although the accuracy might drop without the calibration file, it should give us some idea if this also occurs on TensorRT 10.
The outputs from FP16 (0.942…) are very different from the INT8 case (0.129…).

You can use --loadInputs to run the tool with predefined image data.

Thanks.

Hi AastaLLL,

Unfortunately, I still cannot obtain meaningful results with TensorRT 10.3 and FP16+GPU settings.

I run everything in nvcr.io/nvidia/l4t-jetpack:r36.4.0 docker container with NVIDIA Jetson AGX Orin - Jetpack 6.1 [L4T 36.4.0] device.

Image to tensor code:

import sys
import PIL.Image
import numpy as np

if __name__ == '__main__':
    input_path = sys.argv[1]
    batch_size = int(sys.argv[2])

    # Load and resize image
    im = PIL.Image.open(input_path).resize((1920, 1088))
    data = np.asarray(im, dtype=np.float32)
    
    # Add batch dimension and repeat the image batch_size times
    data = np.expand_dims(data, axis=0)  # Add batch dimension
    data = np.repeat(data, batch_size, axis=0)  # Repeat to create batch
    
    # Convert from NHWC to NCHW format
    data = np.transpose(data, (0, 3, 1, 2))
    
    print(f"Final shape (NCHW format): {data.shape}")
    data.tofile("input_tensor.dat")

Build command:

# GPU+FP16
/usr/src/tensorrt/bin/trtexec --onnx=/workspace/yolov6n_mp_one_v0_2_0_1920_bd_1088x1920.onnx \
        --shapes=input:2x3x1088x1920 \
        --saveEngine=yolov6n_mp_one_v0_2_0_1920_bd_1088x1920_fp16_gpu.engine \
        --fp16 \
        --useSpinWait \
        --separateProfileRun

Inference command:


/usr/src/tensorrt/bin/trtexec --loadEngine=/workspace/yolov6n_mp_one_v0_2_0_1920_bd_1088x1920_fp16_gpu.engine \
                              --fp16 \
                              --loadInputs='input:/workspace/data/input_tensor.dat' \
                              --useSpinWait \
                              --exportOutput='output.json' \
                              --separateProfileRun

Inference results:

Thousands of detections with scores higher than 0.99 which is quite unexpected. I don’t understand what is going on I must be doing something wrong or this model(yolov6n) doesn’t work with this version of TensorRT but how is it possible?

The same onnx file works perfectly with TensorRT version 8.5.2.2. I tried to export the weights(.pt) file to onnx in the same container but results are still not okay.

Hi,

Thanks for your testing.

Are you able to run the YOLO model with ONNXRuntime on JetPack 6.1?
You can find the ORT package in the below link:
https://pypi.jetson-ai-lab.dev/jp6/cu126

If yes, could you share the source/data/model to inference it with ONNXRuntime.
We want to check why the model doesn’t work with TensorRT.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.