TensorRT EfficientNMS plugin FP16 inconsistent(but valid) results

kambrozyna · January 24, 2023, 3:34pm

Description

Hello!

I encountered an issue with the EfficientNMS plugin and its FP16 mode and I’m wondering whether this can be treated as a bug or it is somehow expected/can be explained.

Running the inference multiple times on the same image in the float16 mode very often results in a slightly different output, one or few pixels off. Example:
1st run [104 x 75 from (142, 156)] vs 2nd run [105 x 73 from (141, 158)] with same confidence score.
Also sometimes the order of sorted bounding boxes with the same score is different than in the previous iteration.

I consider the detections valid as they are pretty close. This is a problem for my testing plan as I assumed that when re-using the same TensorRT plan file I would always get deterministic results which has been the case until now.
Are there any ideas how to explain this behavior?
If it’s a bug I’ll try to prepare reproduction using publicly available models/data.

Isolation:

issue occurs re-using the same TRT plan
using FP32 - the issue does NOT occur
using FP16 with plugins around EfficientNMS forced to FP32 - the issue does NOT occur
confirmed that the NMS plugin gets the same data in each iteration and sometimes outputs slightly different (dumped and compared raw bytes)
not all the images are “problematic”, mostly those with many detections

Thank you!

Environment

TensorRT Version: 8.4 - 8.5.1.7 ( previous versions don’t work because of other issue )
GPU Type: RTX2070
Nvidia Driver Version: 525.60.13
CUDA Version: 11.8
CUDNN Version: 8.6.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): baremetal

Relevant Files

Steps To Reproduce

spolisetty · January 25, 2023, 5:51pm

Hi,

We are checking on this issue internally.
Could you please share with us minimal issue repro model/script for better debugging.

Thank you.

kambrozyna · January 27, 2023, 11:47am

Thank you for the reply.
I will try to create some minimum setup to reproduce this issue based on publicly available data. Please allow me some time as I cannot share our projects code/models here.

kambrozyna · February 22, 2023, 9:59am

Hi,

I created a small app to show the problem. What it does:

creates EfficientNMS plugin
configures the plugin based on the provided sample input
runs the plugins enqueue() using sample data once and saves the output for further comparison
runs the enqueue() next 10 times and compares the outputs.
If it’s different than the 1st run it dumps the buffer content into file.

In my case I get different results almost each run for the boxes output.

Thanks!

repro_fp16_nms_issue.tar.gz (613.8 KB)

kambrozyna · March 30, 2023, 11:36am

@spolisetty Hi! Did you have a chance to check the issue internally? Thanks
Kamil