TLT different results

Hi we are working on a azure instance on T4 card, and a jetson nx.
We use TLT v3 to train, eval and make inference with our int8 mask rcnn model.
On T4 results seems perfect (CUDA 11.2 Tensor RT 7.2), with tlt-converter (one available for our jetpack version 4.5 CUDA 10.2 TensorRT 7.1.3) we convert to engine and we lost detections and precission.

Diferent versions may affect and cause this precission lost? How can we go further? Create a docker on t4 with version matched with jetson nx or Update jetson nx, with a new version of TRT (no GA available i think)

Thanks.

PD: Of course i read other threads in order to find a solution, but nothing seems work.