TRT C++ inference - Same input generates different outputs [FP16]

Description

The same input generates different outputs. I am using a model converted from ONNX to TRT FP16, using trtexec tool. Given the same input, the output is approximate to what I get with Onnx, except they vary with each run. The model receives an image, and outputs an embedding of 512 floats.
I then compute the Frobenius Norm from the embedding values, which gives me a meaningful value for the problem I am trying to solve with the model. Output examples:

norm:28.58157
norm:28.57674
norm:28.57919
norm:28.57674
norm:28.58157
norm:28.57201
norm:28.57674
norm:28.58157
norm:28.57674
norm:28.58157
norm:28.57674
norm:28.57201
norm:28.57674
norm:28.57919
norm:28.58393
norm:28.58157

Is this an expected behavior? Shouldn’t the output be identical for the same input, even if with less precision compared to the original model?

Environment

TensorRT Version: TensorRT-8.4.1.5
GPU Type: NVIDIA GeForce RTX 2080
Nvidia Driver Version: 512.95
CUDA Version: 11.1
CUDNN Version: 8.2.1
Operating System + Version: Windows10

Hi,

Could you please share with us the ONNX model and script to reproduce the issue for better debugging.

Thanks