I want to quantize a model trained with Detectnet_v2 in TAO Toolkit and export it in ONNX format.
I tried to run the command “tao model detectnet_v2 export” after creating calibration.sensor with reference to the following page, but it seems that the parameters of the ONNX model are not quantized.
Am I doing something wrong?
TAO Toolkit version: 5.1.0
Network: Detectnet_v2(pretrained model: mobilenet_v2)
To enable QAT during training, simply set the enable_qat parameter to be true in the training_config field of the corresponding spec file of each of the supported networks.
I understand that when converting to TensorRT engine, we can choose between PTQ or QAT, does this mean that we need to train with QAT when outputting in ONNX format?
If so, is a calibration file generated during training and should we specify it in the --cal_cache_file option?
The export command parses the model graph looking for quantized nodes and peels them out to generate an .etlt model file, along with a corresponding calibration_cache file that contains dynamic range scale factors for the intermediate activation tensors. The etlt_model file and the calibration_cache file can be consumed by the converter to generate a low precision (8-bit) TensorRT engine or used directly with the DeepStream SDK.
Referring to the introduced Technical Blog, I performed QAT training with TAO and exported the model.
Although calibration_qat.json was created, the ONNX format model output at the same time was not quantized when the parameters were checked.
Is it not possible to output the model in ONNX format with parameters quantized to INT8 type by TAO’s function?
I ran the following command.
tao model detectnet_v2 export
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/mobilenet_detector_retrained.hdf5
-e $USER_SPECS_DIR/detectnet_v2_mobilenet_retrain.txt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/mobilenet_detector_qat.onnx
–cal_json_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.json
–data_type int8
–batch_size 4
–gen_ds_config
–onnx_route tf2onnx
–verbose
I tried to do what you suggested, but it seems that QAT only works on retrain even if I set “enable_qat=true”.
Is there any other condition than “enable_qat=true” in the spec file?
Sorry for the basic details.
Without pruning/retraining, the first training should be working with QAT enabled. To enable QAT during training, simply set the enable_qat parameter to be true in the training_config field of the training spec file. Could you double check?
Does this mean that QAT cannot be performed with the mobilenet architecture and that we need to use an architecture such as resnet, for example, to perform QAT?
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks