I am using the TAO framework to train a mask rcnn model. My objective is to reduce the inference latency. I am already using resnet10 backbone, fp16 encoding and reduced the model proposal numbers and outputs. After export, the TensorRT engine generates very nice predictions, corresponding to high AP values (above 80%).
To reduce further the latency, I changed the number of anchors generated by the model by changing the list of possible aspects ratios. By default, there are 3 differents ratios used by the model aspect_ratios: “[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]”. Keeping only one possible ratio in my configuration file “[(1.0, 1.0)]”, I get almost the same metrics (AP still above 80%) during training and evaluation. However, after exporting the model and using the engine file instead of the model.tlt, I get very bad qualitative results (AP would probably be below 30%). This is very unfortunate since I can reduce the latency by approximately 20% by removing these aspect ratios.
To reproduce this issue, I am using the mask_rcnn docker provided in nvidia-tao version 0.1.19. I don’t know if it is relevant here but my GPU and drivers are the following: GPU Type : RTX3060 Nvidia Driver Version : 495.29.05 CUDA Version : 11.5
The only difference between a working configuration and the other one is the line defining the anchor aspect ratios in the configuration file : aspect_ratios: “[(1.0, 1.0), (3.0, 0.3), (0.3, 3.0)]” is replaced by aspect_ratios: “[(1.0, 1.0)]”.
Is it possible to export a TensorRT engine using a different number of aspect ratios than the 3 provided in the configuration file ?
Thank you for your reply. I already considered pruning, and I can improve the inference speed by about 10% with stable AP. To further reduce the latency, another option is int8 inference but I could not export the engine with satisfying inference speed. My guess is that some of the mask rcnn layers are not available with int8 precision. I could also reduce the input size but AP decreases significantly.
I observed that there is about 20% latency difference between an empty/random image generated by trtexec and inference on “real” images. This can be explained by the number of proposals to filter out with this kind of architecture, and that’s why I tried to reduce the number of anchor ratios.
To be honest, I checked and reproduced the results many times to be sure that it was not a configuration problem. Using “[(1.0, 1.0)]”, the training / evaluation AP measured by “tao mask_rcnn train / evaluate” is very good, similar to the one obtained with “[(1.0, 1.0), (3.0, 0.3), (0.3, 3.0)]” (respectively 87 and 88% for AP75). However the engine exported using the model with fewer ratios generates very bad predictions, unusable in practice (I would say below AP below 30). I am not able to measure AP using the engine directly, “tao mask_rcnn evaluate” outputs the error “The pruned model must be retrained first” even if I did not prune the model.
That is not expected. Usually the .tlt model should has similar AP against .trt engine.
Can you set a lower threshold and retry?
More, how did you check the AP of the .trt engine? Can you share the full command and full log?
That’s not expected. Please share the full command and full log as well.
where NUM_STEP is the number of training iterations and EXP_NAME the base name of the configuration file used for the experiment. Please find the configuration files for the two models (only difference is the aspect_ratios values), as well as the full log obtained when evaluating the model with only one aspect_ratio. resnet18.txt (1.9 KB) resnet18-b.txt (1.9 KB) resnet18-b-log.txt (379.1 KB)
However in the case of the second model, the .engine outputs bad predictions. I can reproduce this behaviour by changing data_type into fP32, batch size, and by setting other aspect_ratios values. The only way to get correct predictions is to set exactly 3 aspect ratios.
Finally, to measure AP using the generated engines I am using the following command
The -k parameter is described as unnecessary but it generates an error if omitted. I also joined the full output of this command, which is the same for every engine I am testing. The output is the same if I replace .engine by the .etlt model, and as said earlier I did not prune these models at all. eval_engine.txt (1.6 KB)
In the early tests I used trtexec with the following command
trtexec --loadEngine=resnet18.engine --batch=2
Now I measure inference using directly TensorRT in my project :
auto start = high_resolution_clock::now();
bool status = mContext->execute(batchSize, mBuffer->getDeviceBindings().data());
auto stop = high_resolution_clock::now();
I measure the same GPU latency with both methods, but I have to feed the engine with empty images in my own code since inference speed is slower on real images. If I measure inference speed on the engine generated with fewer anchor ratios, the measured time decreases from about 26 ms to 20 ms. However, I am not able to use the engine outputs since the predictions are not accurate any more.
I am unfortunately using a private dataset. However it would help if you could check on your side if you can (or not) reproduce this behavior. If changing the number of aspect ratios on your size does not impact the engine accuracy after export, there may be something I am missing.
I ran tao inference and looked at the generated images and masks. Since there are at most one correct segmentation mask per image, and that there are about 3 objects / image in average, I am quite sure that the corresponding AP is below 30. But the value is not relevant here, it was only to emphasize that I was confident in the qualitative and quantitative difference between both models.
Thank you for your help. I will give it a try when I’ll have some time. If I understand well, I should directly modify the hard-coded aspect_ratios in tlt_mrcnn_config ? Does this mean that the value set in the training config file is not correctly loaded/adjusted at runtime ?