Deepstream is falling back to fp16 for explict quant models while building engine file by saying that calib file is not specified

Complete information of setup.

• Hardware Platform (Jetson / GPU): GPU
• DeepStream Version: 8.0
• TensorRT Version: 10.9.0.34-1+cuda12.8
• NVIDIA GPU Driver Version (valid for GPU only): 570
• Issue Type( questions, new requirements, bugs): question

The model onnx file used is yolov11s_qat_int8_672_dynamic.onnx taken from [here](deepstream_tools/yolo_deepstream at main · NVIDIA-AI-IOT/deepstream_tools · GitHub).

In brief:

Creating engine file using tensorrt bin → int8 engine file is getting created.

When deepstream is creating engine file (when engine file is not present) → it is falling back to fp16 by saying calib file is not specified. (It is explict quant model. Please see detailed explanation below)

In detail:

Building engine file with tensorrt → int8 engine file is correctly getting created. The command used is /usr/src/tensorrt/bin/trtexec --onnx=yolov11s_qat_dynamic.onnx --int8 --fp16 --saveEngine=yolov11s_qat_dynamic.onnx_b1_gpu0_int8.engine.

But, if deepstream creates engine file (when the engine file is not present), then, in the config file, it is seeing network-mode=1 for int8 and is saying that calib file is not specified, so falling back to fp16, but this model onnx file doesn’t come with calib file as it is explict quant.

The config file is taken from [here]( deepstream_tools/yolo_deepstream/deepstream_yolo/config_infer_primary_yoloV11.txt at main · NVIDIA-AI-IOT/deepstream_tools · GitHub ) and edited. It is uploaded here: config.txt (3.8 KB). And I used deepstream-test1 sample app to auto create the engine file with deepstream, the config file for the app is uploaded here: dstest1_config.yml.txt (1.1 KB) (this file is renamed to text as yml is not a supported file in the forum post)

Question: How to set deepstream to generate int8 engine file in such case?

Reason for this question: If engine file was accidentally not-created before launching the deepstream app, it will fallback to fp16 while creating engine file. Also, it was observed in the past that it will rebuild on every launch of the deepstream app because it will always first look for the given engine file path in the config file, which is ...int8.engine, and, because it won’t be there as the saved file will be ...fp16.engine, it will rebuild the same file.

Please help.

  1. if int8-calib-file is not set when network-mode is 1, nvinfer will create a fp16 engine intead. Please refer to this link for how to create int8-calib-file.
  2. regarding ‘rebuild engine’ issue, if the cfgs are inconsist with the geneated engine, nvinfer will recreate engine. hence please set model-engine-file to the path of the generated engine, and set network-mode to 2.

Thanks for replying fanzh. There are still four questions for point-1 of your answer. Please kindly answer them.

(And just a note for point-2 of your answer, the inconsistency happens only because deepstream falls back to fp16, thus creating an engine file with different name than what is mentioned in the config file. But I need int8 engine file, so, I can not use network-mode=2 for fp16, with change to model-engine-file.)

Questions 1 and 2: (question is written in bold, context in light black color)

In the documentation that you linked in the answer for how how to create int8-calib file, it is being generated for yolov8 model which is not explict-quant. (i.e. not QAT trained with Q/DQ nodes). But, what I have is qat-explict-quant-model as linked above which has all the calibration data inside the onnx file itself as answered in this post (quoting below).

usually the QAT will export an onnx with “QuantLinear & DequantLinear “ nodes inside, there will be quantization Scales inside that node. TensorRT can directly convert it to int8/fp8 engine with that scale.(That is called Explict Quantization)

Q1) Does this mean, even though tensorrt was able to create int8 engine file without needing a calibration file with the same onnx file, deepstream requires a calibration file?

Q2) If I created calibration file using the provided documentation in your reply, will it undo the accuracy gained by QAT? Because, as per my understanding, the usual procedure is, train modelPTQ (this adds Q/DQ nodes and does calibration)QAT (this is training for 10 epochs) . So, having a QAT onnx file already means calibration of Q/DQ nodes is done, and also training is done on top of it.

Notes: (I have checked that tensorrt created int8 engine properly for that onnx file by running the following command and checking the output, it had proper Int8 nodes): /usr/src/tensorrt/bin/trtexec --loadEngine=yolov11s_qat_dynamic.onnx_b1_gpu0_int8.engine --dumpLayerInfo --profilingVerbosity=detailed

Question 3 and 4: This question is on generating int8 calibration file with linked documentation in your answer. The document seems to be PTQ (post training quantization), as they are doing calibration for ultralytics trained model which is not QAT trained.

In step 7, the environment variables are being set which export calibration text file which has paths to 1000 calibration images.

In step 8, directly calib.table file is being used in the model’s config file. And this calib.table file is not yet generated.

Q3) Does this mean, when the app is launched, deepstream will first create the calib.table file, and then use it to create the int8 engine file?

Q4) Can this calib.table file be reused in other machines? (Engine file will always be recreated on the machine itself)

No, for exmpale, if batch-size is different with engine, nvinfer will recreate the engine again. To your case, If you set model-engine-file to the path trtexec generated, nvinfer will load the engine directly.

Yes, you don’t need to create int8 calibration file for the QAT model. as written in the topic in your last comment, TensorRT can directly convert it to int8/fp8 engine with that scale.(That is called Explict Quantization).

yes, the calib.table can be used on other machines.

Thanks for these answers fanzh.

For Q3, I have tested it, and calib.table file was generated. Just for completeness of information, as your link points to jetson, I went to [this link]( DeepStream-Yolo/docs/INT8Calibration.md at master · marcoslucianops/DeepStream-Yolo · GitHub ) which is the original repo in that documentation.

Last set of related questions please.

I understand and tested this.

So, it means that for QAT-explict-quant models, I must use trtexec to create the engine file, and can’t let deepstream-app auto-create the int8-engine file, as there will not be any calibration file. Am I right?

Is this a feature not-yet implemented in deepstream? Because, for all the models for which I have a calib file, I can let deepstream create int8 engine file. But explict-quant models will not have a calib file, and for these, I must rely on trtexec. Will this feature be added to deepstream, i.e. while creating engine file, it will look whether it is explict-quant and not ask for calib file for int8 engine?

there are two methods to load the QAT-explict-quant engine in Deepstream.

  1. generate int8 engine with trtexec, then make deepstream to load the engine directly without regenerating the engine, as shown in the doc.
  2. set the path of QAT-explict-quant model in nvinfer’s cfg, deepstream-app will create an engine, which will be an int8 engine regardless of the engine name.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.