Hi Everyone,
Hardware: Jetson Orin NX 16 GB, Jetpack 5.1.1.
TensorRT version: 8.5.2
I’m trying to utilize DLA in my deepstream application so compiled a DLA enabled TensorRT engine trtexec. When I compile the engine with DLA+IN8, I get meaningless results whilst DLA+FP16 or GPU+INT8 combinations working perfectly. If it’s fixed in next versions of TensorRT, I would still need a solution for this specific version.
To compile the engines:
GPU+INT8
trtexec --onnx=/workspace/engine-dev/yolov6n_bd_1088x1920.onnx \
--shapes=input:2x3x1088x1920 \
--saveEngine=model_gn.engine \
--exportProfile=model_gn.json \
--fp16 \
--int8 \
--calib=/workspace/engine-dev/yolov6n_bd_1088x1920.cache \
--useSpinWait \
--separateProfileRun > model_gn.log
DLA+INT8
trtexec --onnx=/workspace/engine-dev/yolov6n_bd_1088x1920.onnx \
--shapes=input:2x3x1088x1920 \
--saveEngine=model_gn.engine \
--exportProfile=model_gn.json \
--fp16 \
--int8 \
--calib=/workspace/engine-dev/yolov6n_bd_1088x1920.cache \
--useDLACore=0 \
--allowGPUFallback \
--useSpinWait \
--separateProfileRun > model_gn.log
DLA+FP16
trtexec --onnx=/workspace/engine-dev/yolov6n_bd_1088x1920.onnx \
--shapes=input:2x3x1088x1920 \
--saveEngine=model_gn.engine \
--exportProfile=model_gn.json \
--fp16 \
--useDLACore=0 \
--allowGPUFallback \
--useSpinWait \
--separateProfileRun > model_gn.log
Results I get:
GPU+INT8
DLA+INT8
DLA+FP16
Hi,
We want to reproduce this issue in our environment to check it further.
Could you share the model and the python script you print out the result with us?
Thanks.
Hi AastaLLL,
Thanks for your prompt response. I’ve sent you the files to reproduce this issue.
Best regards
Hi,
Thanks, we will give it a try and let you know the following.
Thanks.
Hi,
We want to test this issue with our latest software.
Are you able to generate the calibration cache with JetPack 6.1 & TensorRT 10.3 and share it with us?
Thanks.
Hi, I currently don’t have any devices with that version. I need to flash it first. I will let you know when it’s ready.
Hi @AastaLLL ,
Apparently, INT8 calibration is changed in TensorRT 10.3 so my scripts don’t work. I need to figure it out first.
This post has the same issue with me. They solved the issue by using a model optimizer however it doesn’t apply to my model (yolov6).
opened 11:56PM - 23 Aug 24 UTC
triaged
## Description
Hi,
I have been using the INT8 Entropy Calibrator 2 for INT… 8 quantization in Python and it’s been working well (TensorRT 10.0.1). The example of how I use the INT8 Entropy Calibrator 2 can be found in the official TRT GitHub repo ([TensorRT/samples/python/efficientdet/build_engine.py at release/10.0 · NVIDIA/TensorRT · GitHub](https://github.com/NVIDIA/TensorRT/blob/release/10.0/samples/python/efficientdet/build_engine.py))
The warning I’ve been getting starting with TensorRT 10.1 is that the INT8 Entropy Calibrator 2 implicit quantization has been deprecated and superseded by explicit quantization.
I’ve read the official document on the difference between the implicit and explicit quantization processes ([Developer Guide :: NVIDIA Deep Learning TensorRT Documentation](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#intro-quantization)) and they seem to work differently. The explicit quantization seems to expect a network to have QuantizeLayer and DequantizeLayer layers which my networks don’t. The implicit quantization can be used when those layers are not present in a network. Therefore, I am confused about how the implicit quantization can be superseded by the explicit quantization since they seem to work differently.
So, my question is **what needs to be modified in the standard INT8 Calibrator 2 quantization method ([TensorRT/samples/python/efficientdet/build_engine.py at release/10.0 · NVIDIA/TensorRT · GitHub](https://github.com/NVIDIA/TensorRT/blob/release/10.0/samples/python/efficientdet/build_engine.py)) for the deprecation warning not to show up** ? Or **what is the proper way to implement the INT8 Calibrator 2 implicit quantization now that the current one is deprecated ?** Couldn’t find any example using a newer TensorRT version (10.1 and up)
Thank you!
## Environment
**TensorRT Version**: 10.1
**NVIDIA GPU**: 3090
**Operating System**: Windows 10
**Python Version**: 3.9.19
Is there documentation or guide to calibrate the models with the old way in TensorRT 10.3?
Hi,
Did you try the PTQ quantization?
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d...
Suppose you can get the quantized ONNX model with the link above directly.
Thanks.
Hi,
I’m not sure if it works with my model because I use YOLO v6 and in its documentation, they say PTQ degrades the model’s performance significantly. (refer to: YOLOv6/tools/partial_quantization at main · meituan/YOLOv6 · GitHub )
I will try to make it work and let you know but I’m currently having issues with building the modelopt docker image.
kayccc
January 15, 2025, 3:17am
14
Is this still an issue to support? Any result can be shared?
Hi,
Unfortunately, I couldn’t even reproduce the results without DLA or Quantization yet in version TensorRT 10.3 as a lot of things have been changed.
It takes some time as there are other ongoing projects that I work on.
Hi,
Could you try to run the model with trtexec first?
The tool can quantize the model into INT8 automatically.
Although the accuracy might drop without the calibration file, it should give us some idea if this also occurs on TensorRT 10.
The outputs from FP16 (0.942…) are very different from the INT8 case (0.129…).
You can use --loadInputs
to run the tool with predefined image data.
Hi,
trtexec --loadInputs expect the input file to be raw binary data
In practice, you can save the binary data from the numpy array using array.tofile(file)
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.tofile.html
For example, if the input is an image, you could use a python script like this:
import PIL.Image
import numpy as np
im = PIL.Image.open("input_image.jpg").resize((512, 512))
data = np.asarray(im, dtype=np.float32)
data.tofile("input_tensor.dat")
This will conver…
Thanks.
Hi AastaLLL,
Unfortunately, I still cannot obtain meaningful results with TensorRT 10.3 and FP16+GPU settings.
I run everything in nvcr.io/nvidia/l4t-jetpack:r36.4.0 docker container with NVIDIA Jetson AGX Orin - Jetpack 6.1 [L4T 36.4.0] device.
Image to tensor code:
import sys
import PIL.Image
import numpy as np
if __name__ == '__main__':
input_path = sys.argv[1]
batch_size = int(sys.argv[2])
# Load and resize image
im = PIL.Image.open(input_path).resize((1920, 1088))
data = np.asarray(im, dtype=np.float32)
# Add batch dimension and repeat the image batch_size times
data = np.expand_dims(data, axis=0) # Add batch dimension
data = np.repeat(data, batch_size, axis=0) # Repeat to create batch
# Convert from NHWC to NCHW format
data = np.transpose(data, (0, 3, 1, 2))
print(f"Final shape (NCHW format): {data.shape}")
data.tofile("input_tensor.dat")
Build command:
# GPU+FP16
/usr/src/tensorrt/bin/trtexec --onnx=/workspace/yolov6n_mp_one_v0_2_0_1920_bd_1088x1920.onnx \
--shapes=input:2x3x1088x1920 \
--saveEngine=yolov6n_mp_one_v0_2_0_1920_bd_1088x1920_fp16_gpu.engine \
--fp16 \
--useSpinWait \
--separateProfileRun
Inference command:
/usr/src/tensorrt/bin/trtexec --loadEngine=/workspace/yolov6n_mp_one_v0_2_0_1920_bd_1088x1920_fp16_gpu.engine \
--fp16 \
--loadInputs='input:/workspace/data/input_tensor.dat' \
--useSpinWait \
--exportOutput='output.json' \
--separateProfileRun
Inference results:
Thousands of detections with scores higher than 0.99 which is quite unexpected. I don’t understand what is going on I must be doing something wrong or this model(yolov6n) doesn’t work with this version of TensorRT but how is it possible?
The same onnx file works perfectly with TensorRT version 8.5.2.2. I tried to export the weights(.pt) file to onnx in the same container but results are still not okay.
Hi,
Thanks for your testing.
Are you able to run the YOLO model with ONNXRuntime on JetPack 6.1?
You can find the ORT package in the below link:
https://pypi.jetson-ai-lab.dev/jp6/cu126
If yes, could you share the source/data/model to inference it with ONNXRuntime.
We want to check why the model doesn’t work with TensorRT.
Thanks.
system
Closed
February 26, 2025, 12:55am
20
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.