Random Bounding Box in FasterRCNN etlt model in Xavier 30W Mode

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Xavier
• DeepStream Version DeepStream 5.0
• JetPack Version (valid for Jetson only) Jetpack 4.4
• TensorRT Version TensorRT 7.0
• NVIDIA GPU Driver Version (valid for GPU only)

I have retrained the FasterRCNN etlt model and wrote the config file for the model.
The model is running with INT8 precision mode. But the problem is also happening in FP32 and FP16 precision mode.
Originally, in the MaxN mode, the random bounding box happened which is the same as seen in the 30W Mode picture. Then, I compiled the TensorRT OSS and the problem is resolved in MaxN mode. However, when I change the power mode to 30W, then the thing happens again. Do you know why this is happening in the 30W Mode ? I tried to increase the bounding box display threshold but it is useless.

For your reference, I am using the custom plugin provided in https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps for the etlt inference.

In 30W Mode,

In MaxN Mode,


Besides, https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/tree/master/TRT-OSS/Jetson this website told us that we should replace according to this [ pwd/out/libnvinfer_plugin.so.7.m.n /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.x.y]

The compiled files in OSS are [libnvinfer_plugin.so, libnvinfer_plugin.so.7.0.0, libnvinfer_plugin.so.7.0.0.1] while the original plugins in /usr/lib/aarch64-linux-gnu/ are [libnvinfer_plugin.so, libnvinfer_plugin.so.7, libnvinfer_plugin.so.7.1.0]. Some files are not named the same as the format you gave. Could you also give more detailed description for this issue as well ?

Thanks.

Hi @cpchiu,
libnvinfer_plugin.so, libnvinfer_plugin.so.7.0.0 are link files, you just need to replace the real file, e.g. libnvinfer_plugin.so.7.1.0, that is, replacing libnvinfer_plugin.so.7.1.0 with libnvinfer_plugin.so.7.0.0.1.

Could you check if you did above replacement? If not, please replace and test again.

Thanks!

@mchi I have replaced

  1. libnvinfer_plugin.so with compiled libnvinfer_plugin.so
  2. libnvinfer_plugin.so.7 with compile libnvinfer_plugin.so.7.0.0
  3. libnvinfer_plugin.so.7.1.0 with libnvinfer_plugin.so.7.0.0.1

Would this be fine ? The problem happened even with above replacement 1, 2 and 3 . I did not test with only replacement 3 alone yet. But why the power mode will lead to this problem?

Hi @cpchiu,
Sorry for late!

Is it possible to share us your model to take a try?

Thanks!

Hi @mchi,
Thank you for your reply and help. I have uploaded the documents(config files) and models (both FP16 and INT8) to https://github.com/KelvinCPChiu/FasterRCNN_etlt/tree/master. I have checked again with FP16 precision mode and 30W power mode. With batch size of 6, the bounding box is properly displayed. If the batch size is changed to 7,the bounding box is going to be much more crazy than above figure shown.

Thanks!

@mchi I have tested the same network with similar configuration in Jetson Xavier NX. It has the same problem happening.

what does “similar configuration” refer to?

@mchi In Jetson Xavier AGX with FP16 model and 30W power mode, the problem happens in batch-size of 7 and 8. It doesn’t have any problem with batch-size of 6. In NX with FP16 and 15W power mode, the batch-size 6 will already give the problem. Similar configuration here means the change in batch-size in the configuration file. Thank you very much for your help and reply!

@cpchiu
I uploaded the files extracted from stream after inference, for batch size 6 and batch size 8 fp16 mode 30w power mode on Xavier, using your model you posted,
https://github.com/KelvinCPChiu/FasterRCNN_etlt/tree/master
see the first picture within each folder, do you mean the issue like the first picture shown? if yes, both batch size 6 and 8 have the issue.

@amycao Thank you very much for your reply and help. The batch size 6 has no problem in my Xavier AGX. I have uploaded the video in this link. The filename format is [Precision Mode]_B[Batch Size]__[Power mode in Xavier].mp4.

https://astri-my.sharepoint.com/:f:/g/personal/kelvinchiu_astri_org/EpQRLerZgiFDimf5I3_RtDsBAJeQjaffSNqWG0ZwdPQUgA

Where is the folder you mentioned ?

Hi @cpchiu
If you use video output, i.e. without “-d” option, can this issue be still reproduciable?

When batch increases or power mode change to 3, the time of one inference also increases.
So it may be because the inference can’t process the decoded frame in time and cause this issue.
But, still need look into further why this issue happens.

Thanks!

Oh, seems uploading exceeds limit size.
put in dropbox, please let me know if you can access it.


@mchi Where should I add the -d option ? The deepstream-app does not have this option. Do you means disabling the OSD or the video display ? Besides, this problem is happening on the local display as well. Therefore, this problem should be happened in or before OSD in the GStreamer pipeline. Thanks.

@amycao Thanks. But I could not access it. Error message popped up as follow: “The folder ‘/tmpbs8/tmpbs8’ doesn’t exist.” and “The folder ‘/tmpbs6/tmpbs6’ doesn’t exist.”

How about this


@amycao Thanks. I have checked the pictures you shared. It seems the result is fine to me because it is consistent aross the batch of same video. Could I know what is the power mode and precision mode you used ?

Could you also check the link I shared in post number 10. It better describes the problem I encountered which I have no idea how to solve.

Please see inline.

Yes. That is the link I mentioned.

This one, fp16_b8_30W.mp4, right?

Yes. fp16_b8_30W, fp16_b8_MaxN are problematic comparing with the fp16_b6_30W, fp16_b6_MaxN.
int8_b8_30W is problematic comparing with int8_b8_MaxN mode. You could find there are random abnormal bound boxes flying around.