• Hardware : A5000
• Network Type : Mask_rcnn
• TLT Version tao 3.22.05
I tested the fps for my pruned MaskRCNN model and it had a lower FPS compared with the unpruned model, the test was done on an A5000 machine, and I remember the GPU compute time increased from ~20ms to ~28ms, the engine file was generated with batch size 1. I have done previous such experiments with different models trained on the same version of tao toolkit and have’nt noticed this behaviour. The pruned model is roughly 60% the original model’s size and my assumption here is that lower the number of parameters lower should be the time taken for inference.
(upload://kCCmJw6q6n9fTphYd5hkLmP3U6S.txt) (9.2 KB)
, same trend unpruned model is faster.
Another interesting thing I noticed is that the fp16 engine files are slightly faster than the original int8 engine files, by around ~1FPS.
I have another observation, I ran a different training with only the backbone changed from resnet50 to resnet18 and the model was not pruned, I hoped for a higher FPS for the int8 engine file but interestingly this model was slower by around 12 FPS to the original unpruned resnet50 model, following is the trtexec log coco_2017_maskrcnn_02_09_24_v10_step_660000.txt (8.4 KB)
Let’s look at the original problem at higher priority and we can solve this issue afterwards.
Appreciate that you did the training and fps testing from your end. We are currently trying to add maskrcnn to our set of models for deployments in the future. We have refrained from using the latest 5.x tao containers as mentioned in other forum posts from me and my team the tfrecords generated in 5.x has issues, this is why we still stick to 3.22.05 containers which have helped us till now. Kindly help us understand whether this is a model implementation difference between the 3.x and the 5.x containers that cause this difference in the results, I cannot think of any other reason for this behaviour.
[10/09/2024-05:40:18] [I] [TRT] Model version: 0
[10/09/2024-05:40:18] [I] [TRT] Doc string:
[10/09/2024-05:40:18] [I] [TRT] ----------------------------------------------------------------
[10/09/2024-05:40:18] [I] Finished parsing network model. Parse time: 0.15425
[10/09/2024-05:40:18] [E] Cannot find input tensor with name "Input" in the network inputs! Please make sure the input tensor names are correct.
[10/09/2024-05:40:18] [E] Network And Config setup failed
[10/09/2024-05:40:18] [E] Building engine failed
[10/09/2024-05:40:18] [E] Failed to create engine from model or file.
[10/09/2024-05:40:18] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=/workspace/tao-experiments/output/etlt/coco_2017_maskrcnn_02_09_24_step_660000_pruned_1_step_660000.onnx --calib=/workspace/tao-experiments/output/etlt/coco_2017_maskrcnn_02_09_24_step_660000_pruned_1_step_660000.cal --int8 --saveEngine=/workspace/tao-experiments/output/etlt/coco_2017_maskrcnn_02_09_24_step_660000_pruned_1_step_660000.onnx_b1_gpu0_int8.engine --maxShapes=Input:1x3x832x1344 --minShapes=Input:1x3x832x1344 --optShapes=Input:1x3x832x1344
You can find the command in the last line of the error, I checked the documentation MaskRCNN - NVIDIA Docs given that name for input layer in the command.
I am running the command in the tao5.x:tf1.15 docker container.
Currently, mask_rcnn can only export to .uff file instead of .onnx file.
Please use mask_rcnn export xxx to generate tensorrt engine. You can find the command in 20241007_mask_rcnn_forum_307832.txt (543.0 KB)
BTW, you can also use the command mentioned in TRTEXEC with Mask RCNN - NVIDIA Docs to generate tensorrt engine.