Using a onnx model in INT8 mode for jetson Orin AGX

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) Jetson
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Yolo_v4
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
toolkit version : 5.2.0
• Training spec file(If have, please share here)
yolo_v4_train_resnet18.txt (3.3 KB)

Here is my situation, I trained a yolov4 model using the tao-getting-started notebook (version 5.0.0). The training went well and I exported the onnx file.
My application run on a Jetson Orin AGX using deepstream sdk to make real time inference on a 1080p stream.

I have an nput size of 1280x736 for the model, but when I run the model on the jetson it seems to struggle to keep up (GPU usage is at 100% and the output video stream have a very bad quality), problem that I don’t have with other models I’ve trained with smaller input size. Since currently i’m running the model in fp32 mode, I want to see if I can have an improvement in int8 mode

So here are some questions that I have :

1 - Is it normal that the Jetson struggle with running this model ? It seems to run fine on my DGPU (RTX 3070)
2- Following the notebook, After the training, I used this code to export the model :

# tao <task> export will fail if .onnx already exists. So we clear the export folder before tao <task> export
!rm -rf $LOCAL_EXPERIMENT_DIR/export
!mkdir -p $LOCAL_EXPERIMENT_DIR/export
# Generate .onnx file using tao container
!tao model yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.hdf5 \
                    -o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.onnx \
                    -e $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt \
                    --target_opset 12 \
                    --gen_ds_config

and then this code to generate the calibration file for using in int8 mode

# To export in INT8 mode (generate calibration cache file). 
!tao deploy yolo_v4 gen_trt_engine -m $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.onnx \
                                   -e $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt \
                                   --cal_image_dir $DATA_DOWNLOAD_DIR/training/image_2 \
                                   --data_type int8 \
                                   --batch_size 8 \
                                   --batches 100 \
                                   --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
                                   --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
                                   --engine_file $USER_EXPERIMENT_DIR/export/trt.engine.int8 \
                                   --results_dir $USER_EXPERIMENT_DIR/export

Is this correct ?

3 - How do I use it in deepstream ? All exemples that i’ve seen use a .etlt model and a .txt calibration file, but I have a .onnx model, and the following files : cal.bin, cal.tensorfile, trt.engine.int8.
I know that the engine file is supposed to be generated on the platform that will make the inference (for exemple I used the TrafficCamNet from the Nvidia model zoo with the .etlt and int8 calibration files to generate the engine file on my jetson and it works fine)

Any help would be welcomed, thank you

No, running inference in Jetson Orin should be working.
Please try to run with official deepstream_tao_apps. GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream.

Yes.

Please run with GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream.
You can set .onnx file in the spec file. Refer to deepstream_tao_apps/configs/nvinfer/yolov4_tao/pgie_yolov4_tao_config.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub.
You can run fp32 or fp16 firstly to check if it works. If run in INT8 mode, cal.bin is needed.

Thank you for the answer. I could execute my model in fp16 and this greatly reduced the load on the jetson gpu ! It no longer struggle to run in real time. But to go even further, I will need to execute several models in parallels on the jetson so I would really beneficiate from the int8 process mode.
Unfortunately when I run the inference with the int8 calibration file (generated from the previous code). The engine file is generated successfully on the Jetson, but then I don’t get any boxes from the inference.
Here is the log I got the frst time generating the engine file :

WARNING: Deserialize engine failed because file path: yolov4_resnet18_epoch_100.onnx_b1_gpu0_int8.engine open error
0:00:05.548265984 17739 0xaaaaf5a5ea40 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :yolov4_resnet18_epoch_100.onnx_b1_gpu0_int8.engine failed
0:00:05.711011456 17739 0xaaaaf5a5ea40 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :yolov4_resnet18_epoch_100.onnx_b1_gpu0_int8.engine failed, try rebuild
0:00:05.711072768 17739 0xaaaaf5a5ea40 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: builtin_op_importers.cpp:5243: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 199) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 203) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 209) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 314) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 318) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 323) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 417) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 420) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 424) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 681) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor (Unnamed Layer* 685) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor BatchedNMS, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
0:17:45.780674914 17739 0xaaaaf5a5ea40 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: yolov4_resnet18_epoch_100.onnx_b1_gpu0_int8.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 5
0   INPUT  kFLOAT Input           3x736x1280      
1   OUTPUT kINT32 BatchedNMS      1               
2   OUTPUT kFLOAT BatchedNMS_1    200x4           
3   OUTPUT kFLOAT BatchedNMS_2    200             
4   OUTPUT kFLOAT BatchedNMS_3    200             

0:17:46.091419294 17739 0xaaaaf5a5ea40 INFO                 nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<nvinfer0> [UID 1]: Load new model:Inference.txt sucessfully
Pipeline is PREROLLING ...

Are those warnings normal ?

The warnings are normal. Can you double check the fp16 can work well under the same environment?

Yes, on the same environment, using the same pipeine, I just change the Inference config file :

model-engine-file=yolo4_resnet18_epoch_100.onnx_b1_gpu0_int8.engine
int8-calib-file=cal.bin
network-mode=1

to

#model-engine-file=yolo4_resnet18_epoch_100.onnx_b1_gpu0_int8.engine
model-engine-file=yolo4_resnet18_epoch_100.onnx_b1_gpu0_fp16.engine
#int8-calib-file=cal.bin
network-mode=2

And it seems to work fine

Can you upload the full config file?

InferenceConfig.txt (875 Bytes)
Sure

Can you change to below and let deepstream generate a new engine and check again?
change

model-engine-file=yolov4_resnet18_epoch_100.onnx_b1_gpu0_int8.engine

to

model-engine-file=yolov4_resnet18_epoch_100.onnx_b1_gpu0_int8_again.engine

So, I got basically the same result, except that this time it didn’t save the engine file for some reason ? (And it’s very long to generate it). But it did run same as before, same warnings during generation, still no box on the output

It is due to the engine file name is not set in the main config file.

For the cal.bin, did you ever verify the int8 using tao yolo_v4 inference?

I did not, i will try and come back to you

So I used tao yolo_v4 inference using the trt.engine.int8 file that was generated with the calibration file, and it seems to work perfectly fine (I can see the bboxes). So that’s really weird…

edit : just to be clear, this is on my dgpu, where I trained the model, not on the jetson orin

In Jetson device, please follow the steps mentioned in GitHub - NVIDIA/tao_deploy: Package for deploying deep learning models from TAO Toolkit to run tao-deploy.
Similar topic:
Tao-deploy on Orin AGX CLI Error - #15 by Morganh

You can generate the engine and cal.bin again in Jetson device directly.

I will try that, thank you

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks