Can't get TLT trained model get to work on Deepstream - Jetson (NX)

Me and my colleague are trying for days to get a TLT 3.0 trained Detectnet-v2 model working on our NX’s using Deepstream 5.1. The model runs (with a warning, see further in text), but not a single detection

Let me break down the pipeline;
Training machine: X86 - RTX 3070 Using TLT3.0 ( tlt-streamanalytics:v3.0_dp_py3)
If I recall correctly, the tensorrt version in the dockerfile is 7.2xx.
After too many attempts, we went basic and executed the detectnet-V2 example with the kitti dataset.
None of the params were changed and we used the non-qat retrain, export and int-8 conversion.

Copied the resnet18.etlt and calibration.bin to our Xavier NX (multiple) w Jetpack45 (Tensorrt 7130 and Deepstream 5.1)
Ran the following command:

./tlt-converter /resnet18_detector_qat.etlt -k MyAPIkey -c /calibration_qat.bin -o output_cov/Sigmoid,output_bbox/BiasAdd -d 3,768,1024 -m 64 -i nchw -t int8 -e /resnet18_detector.trt -b 4 -w 1000000000

After it finished, copied the model.trt file and labelmap in our runtime folders and modified the PGIE file like this:

[property]
gie-unique-id=1
gpu-id=0
enable-dbscan=true
net-scale-factor=0.0039215697906911373
#model-file=models/primary/resnet10.caffemodel
#proto-file=models/primary/resnet10.prototxt
model-engine-file=models/primary/resnet18_detector.trt
labelfile-path=models/primary/labels.txt
#int8-calib-file=models/primary/cal_trt.bin
force-implicit-batch-dim=1
batch-size=12
network-mode=1
process-mode=1
model-color-format=0
num-detected-classes=3
interval=0
#output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
#scaling-filter=0
#scaling-compute-hw=0

[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.2
group-threshold=1

In fact, we reproduced everything as in this youtube vid:

with the exeption we had to change the #output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid to output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd to get our deepstream app started, while the guy in the video didn’t change these params.

The warning at startup of our Deepstream app is:

WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

Which seems normal to me when building a model on x86, but deploy it on aarch.
I’ve read smething about building a custom nvinfer parser, but the guy in the vid didn’t do that, So it’s not necessary for Detectnet_v2?

Any help would be greatly appreciated!!

See DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation

output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd

More, if you want to run inference in Jetson devices, please generate trt engine directly in the Jetson devices.

Last, the detectnet_v2 do not need a custom nvinfer parser, see DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation
and YOLOv3 — Transfer Learning Toolkit 3.0 documentation for comparison.