I want to output quantized ONNX models with TAO Toolkit


I want to quantize a model trained with Detectnet_v2 in TAO Toolkit and export it in ONNX format.
I tried to run the command “tao model detectnet_v2 export” after creating calibration.sensor with reference to the following page, but it seems that the parameters of the ONNX model are not quantized.
Am I doing something wrong?

TAO Toolkit version: 5.1.0
Network: Detectnet_v2(pretrained model: mobilenet_v2)

Thank you.

tao model detectnet_v2 export
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/mobilenet_detector_retrained.hdf5
-e $USER_SPECS_DIR/detectnet_v2_mobilenet_retrain.txt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/mobilenet_detector.onnx
–onnx_route tf2onnx
–data_type int8
–batches 10
–cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
–cal_data_file $USER_EXPERIMENT_DIR/calibration.tensor

Did you train a model with QAT enabled?
Refer to DetectNet_v2 - NVIDIA Docs

To enable QAT during training, simply set the enable_qat parameter to be true in the training_config field of the corresponding spec file of each of the supported networks.

I understand that when converting to TensorRT engine, we can choose between PTQ or QAT, does this mean that we need to train with QAT when outputting in ONNX format?
If so, is a calibration file generated during training and should we specify it in the --cal_cache_file option?

Yes, it is needed to train a model with QAT enabled.

Yes, we should specify it in the --cal_cache_file option.

Refer to Improving INT8 Accuracy Using Quantization Aware Training and the NVIDIA TAO Toolkit | NVIDIA Technical Blog

The export command parses the model graph looking for quantized nodes and peels them out to generate an .etlt model file, along with a corresponding calibration_cache file that contains dynamic range scale factors for the intermediate activation tensors. The etlt_model file and the calibration_cache file can be consumed by the converter to generate a low precision (8-bit) TensorRT engine or used directly with the DeepStream SDK.

tao export detectnet_v2
           -m $experiment_dir_pruned_qat/weights/$model_file_string.tlt \
           -o $output_model_path \
           -k $KEY \
           --data_type int8 \
           --batch_size N \
           --cal_cache_file $calibration_cache_file \
           --engine_file $engine_file_path

Referring to the introduced Technical Blog, I performed QAT training with TAO and exported the model.
Although calibration_qat.json was created, the ONNX format model output at the same time was not quantized when the parameters were checked.
Is it not possible to output the model in ONNX format with parameters quantized to INT8 type by TAO’s function?

I ran the following command.

tao model detectnet_v2 export
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/mobilenet_detector_retrained.hdf5
-e $USER_SPECS_DIR/detectnet_v2_mobilenet_retrain.txt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/mobilenet_detector_qat.onnx
–cal_json_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.json
–data_type int8
–batch_size 4
–onnx_route tf2onnx

Can you do a quick experiment?
Just run training with 1 epoch with QAT enabled. Not prune or retrain.
Then export this model. To check if it is QATed.

I tried to do what you suggested, but it seems that QAT only works on retrain even if I set “enable_qat=true”.
Is there any other condition than “enable_qat=true” in the spec file?
Sorry for the basic details.

Without pruning/retraining, the first training should be working with QAT enabled. To enable QAT during training, simply set the enable_qat parameter to be true in the training_config field of the training spec file. Could you double check?

The contents of training_config are set up like this.
I think I have set enable_qat correctly…

training_config {
batch_size_per_gpu: 4
num_epochs: 1
enable_qat: True
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-07
max_learning_rate: 5e-05
soft_start: 0.1
annealing: 0.7
regularizer {
type: L1
weight: 3e-09
optimizer {
adam {
epsilon: 1e-08
beta1: 0.9
beta2: 0.999
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0

I will also share the spec file we are using just in case. Any other issues?
detectnet_v2_mobilenet_train.txt (5.9 KB)

Can you share the full training log?

I will share the training log.
training_log.txt (92.5 KB)

According to tao_tensorflow1_backend/nvidia_tao_tf1/cv/detectnet_v2/model/detectnet_model.py at main · NVIDIA/tao_tensorflow1_backend · GitHub, for “mobilenet_v1” and “mobilenet_v2” networks, it will not convert the keras model to quantize keras model.

Does this mean that QAT cannot be performed with the mobilenet architecture and that we need to use an architecture such as resnet, for example, to perform QAT?

Yes, that’s right.