Hi,
I have successfully used different models with TensorRT using FP16 data type, but I’ve found that there are differences in inference output across different GPUs architectures (T4 gpu and A2000 laptop gpu), this was somewhat expected from what I have read in other threads.
So my question is what would be the best strategy to reduce these differences across different GPU architectures as much as possible? any suggestions on best practices to achieve consistent inference results across GPUs would be greatly appreciated.
Environment
TensorRT 8.6.1
Nvidia Driver Version: 525
CUDA Version: 11.7
CUDNN Version: 8.5.0
Operating System + Version: ubuntu 20
container: nvcr.io/nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04