Questions about export of tlt model and sizes

Hi,
I have a question about export from tlt model to etlt model. During it a calibration file is generated, named cal.bin. Should i run this command on target computer where model will be deployed or all gpu optimization is done during exporting from etlt to trt model?
Second question is that, I was training faster_rcnn with resnet18 and custom dataset (images had different sizes), everything during training was fine, and final mAP (after retraining after pruning) was around 0.83, but then after exporting and converting to trt mAP dropped to something around 0.50 which is huge difference (it dropped both for int8 and fp16 exports). After that I tried my custom dataset but all images were resized to the same size and mAP after export did not dropped down. So as far as I understand, trt model should not be able to accept different sizes of images than passed in -d argument during calling tlt-converter. But inference and evaluation on trt engine was working even though my original dataset had different sizes of images. And also from what I see all images are resized to size given in config during preprocessing so why it only works correctly if images before preprocessing are all of the same size? It would be really helpfull if I could get some explanation about how exactly resizing of images work during all steps, from what I think preprocessed images are passed only during training phase but both export and convert uses original ones.
Everything was done using faster_rcnn.ipynb notebook from examples folder included in workspace folder in docker container pulled with:
docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3
My graphic card is Tesla V100.
Best regards

Which command?

See https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/open_model_architectures.html#fasterrcnn

FasterRCNN

  • Input size : C * W * H (where C = 1 or 3; W > =160; H >=160)
  • Image format : JPG, JPEG, PNG
  • Label format : KITTI detection

Note

The train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

!if [ -f $USER_EXPERIMENT_DIR/data/faster_rcnn/frcnn_kitti_resnet18_retrain_int8.etlt ]; then rm $USER_EXPERIMENT_DIR/data/faster_rcnn/frcnn_kitti_resnet18_retrain_int8.etlt; fi
!faster_rcnn export --gpu_index $GPU_INDEX -m $USER_EXPERIMENT_DIR/data/faster_rcnn/frcnn_kitti_resnet18_retrain.epoch4.tlt  \
                    -o $USER_EXPERIMENT_DIR/data/faster_rcnn/frcnn_kitti_resnet18_retrain_int8.etlt \
                    -e $SPECS_DIR/default_spec_resnet18_retrain_spec.txt \
                    -k $KEY \
                    --data_type int8 \
                    --batch_size 8 \
                    --batches 10 \
                    --cal_cache_file $USER_EXPERIMENT_DIR/data/faster_rcnn/cal.bin

Yes, but it is written here that train tool does not support this, when I did run training with images of multiple resolutions with resulting mAP of 0.84 but only after export to TRT the mAP dropped down to 0.5, so it is somehow supported (I mean it does not throw any errors or something like that) and if we have to resize all images offline then this part of augmentation config can be dropped?
augmentation_config {
preprocessing {
output_image_width: 1920
output_image_height: 1080
output_image_channel: 3

}

It is not needed to run this command in target device. Just need to copy the etlt file and cal.bin file into the target device.

In TLT 3.0, for detectnet_v2 or faster_rcnn network, it is needed to resize images/labels offline. If the output image height and output image width of the preprocessing block doesn’t match with the dimensions of the input image, the dataloader either pads with zeros or crops to fit to the output resolution. It does not resize the input images and labels to fit. See DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation

For “but only after export to TRT the mAP dropped down to 0.5”, which command did you run?

!CUDA_VISIBLE_DEVICES=$GPU_INDEX tlt-converter -k $KEY  \
           -d 3,1080,1920 \
           -o NMS \
           -c $USER_EXPERIMENT_DIR/data/faster_rcnn/cal.bin \
           -e $USER_EXPERIMENT_DIR/data/faster_rcnn/trt.int8.engine \
           -b 8 \
           -m 4 \
           -t int8 \
           -i nchw \
           $USER_EXPERIMENT_DIR/data/faster_rcnn/frcnn_kitti_resnet18_retrain_int8.etlt

Thanks for helping.

Please do evaluation with fp32 and fp16 tensorrt engine and check their mAP result too.