I trained the MaskRCNN model with 314 images and tested it on 77 images using the TAO framework. It performs well both quantitatively and qualitatively. However, when I generate the int8 model, it performs significantly worse than the fp32, and fp16 versions.
I generated two calibration files by providing two sets of custom directory of images to tao expert command. In the first directory, I combined both the train and test sets’ images. In the second directory, all the images from the first directory were flipped horizontally to double the amount the calibration data.
Both the files were then used to generate the two int8 engine file using tao converter. Further, both of them performed poorly than the fp32 and fp16 versions. Below are the commands used:
Calibration file generation:
%env NUM_STEP=13000
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment/export_int
!tao mask_rcnn export -m $USER_EXPERIMENT_DIR/experiment/model.step-$NUM_STEP.tlt \
-k $KEY \
-o $USER_EXPERIMENT_DIR/experiment/model.step-$NUM_STEP.etlt \
-e $SPECS_DIR/mask-rcnn_train_resnet50.txt \
--batch_size 1 \
--gpu_index 0 \
--data_type int8 \
--cal_image_dir $USER_EXPERIMENT_DIR/data/v1_clean/train-cal/images \
--batches 381 \
--cal_cache_file $USER_EXPERIMENT_DIR/experiment/export_int/maskrcnnv.cal \
--cal_data_file $USER_EXPERIMENT_DIR/experiment/export_int/maskrcnn.tensorfile
Engine file generation:
!tao converter -k $KEY \
-d 3,832,1344 \
-o generate_detections,mask_fcn_logits/BiasAdd \
-c /workspace/tao-experiments/mask_rcnn/experiment/export_int/maskrcnnv.cal \
-e $USER_EXPERIMENT_DIR/experiment/export_int/trt.int8.engine \
-b 1 \
-m 1 \
-t int8 \
-i nchw \
-s \
$USER_EXPERIMENT_DIR/experiment/model.step-$NUM_STEP.etlt
Inference file generation:
!tao mask_rcnn inference -i $DATA_DOWNLOAD_DIR/v1_clean/test/images \
-o $USER_EXPERIMENT_DIR/experiment/test_predicted_images_int8 \
-e $SPECS_DIR/mask-rcnn_train_resnet50.txt \
-m $USER_EXPERIMENT_DIR/experiment/export_int/trt.int8.engine \
-l $USER_EXPERIMENT_DIR/experiment/annotated_labels \
-c $SPECS_DIR/abels.txt \
-t 0.2 \
-k $KEY \
--include_mask
Other Information:
• Hardware - NVIDIA GeForce RTX 2080 Ti
• Network Type - Mask_rcnn
• toolkit_version- 3.22.02
• Training spec file:
seed: 123
use_amp: False
warmup_steps: 1000
checkpoint: "/workspace/tao-experiments/mask_rcnn/pretrained_resnet50/pretrained_instance_segmentation_vresnet50/resnet50.hdf5"
learning_rate_steps: "[10000, 15000, 20000]"
learning_rate_decay_levels: "[0.1, 0.01, 0.001]"
total_steps: 25000
train_batch_size: 2
eval_batch_size: 4
num_steps_per_eval: 1000
momentum: 0.9
l2_weight_decay: 0.0001
warmup_learning_rate: 0.0001
init_learning_rate: 0.01
data_config{
image_size: "(832, 1344)" # “(height -1080, width-1920)”
augment_input_data: True
eval_samples: 20
training_file_pattern: "/workspace/tao-experiments/TF_data/train*.tfrecord"
validation_file_pattern: "/workspace/tao-experiments/TF_data/val*.tfrecord"
val_json_file: "/workspace/tao-experiments/data/v1_clean/test/annotations/instances_default.json"
# dataset specific parameters
num_classes: 5
skip_crowd_during_training: True
}
maskrcnn_config {
nlayers: 50
arch: "resnet"
freeze_bn: True
freeze_blocks: "[0,1]"
gt_mask_size: 112
# Region Proposal Network
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_min_size: 0.
# Proposal layer.
batch_size_per_im: 512
fg_fraction: 0.25
fg_thresh: 0.5
bg_thresh_hi: 0.5
bg_thresh_lo: 0.
# Faster-RCNN heads.
fast_rcnn_mlp_head_dim: 1024
bbox_reg_weights: "(10., 10., 5., 5.)"
# Mask-RCNN heads.
include_mask: True
mrcnn_resolution: 28
# training
train_rpn_pre_nms_topn: 2000
train_rpn_post_nms_topn: 1000
train_rpn_nms_threshold: 0.7
# evaluation
test_detections_per_image: 100
test_nms: 0.5
test_rpn_pre_nms_topn: 1000
test_rpn_post_nms_topn: 1000
test_rpn_nms_thresh: 0.7
# model architecture
min_level: 2
max_level: 6
num_scales: 1
aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]"
anchor_scale: 8
# localization loss
rpn_box_loss_weight: 1.0
fast_rcnn_box_loss_weight: 1.0
mrcnn_weight_loss_mask: 1.0
}
How do I get better results with int8 model that could be comparable to fp32 and fp16?