Why smaller backbone lead to deepstream building larger engine and slower model?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): Jetson AGX Xavier
• DeepStream Version: 5.0
• JetPack Version (valid for Jetson only): 4.4
• TensorRT Version: 10.2
• Issue Type( questions, new requirements, bugs): question

I trained 2 yolov3 models using Nvidia’s TLT, both are pruned and retrained. The main different between them are the backbone network used. I then check the engine performance created by deepstream using trtexec, these are the result:

model: yolo_mobilenet_v2_epoch_052.etlt (3.9 MB)
engine: yolo_mobilenet_v2_epoch_052.etlt_b2_gpu0_fp16.engine (14.8 MB)
batch size: 2
input size: 1152x1440
Jetson AGX (boosted clock): mean: 23.2028 ms (~86.1388 FPS) | F16

model: yolo_resnet10_epoch_033.etlt (4.5 MB)
engine: yolo_resnet10_epoch_033.etlt_b2_gpu0_fp16.engine (8.0 MB)
batch size: 2
input size: 1152x1440
Jetson AGX (boosted clock): mean: 17.6101 ms (~113.512 FPS) | F16

As expected, the model whose backbone is mobilenet_v2 lead to a smaller .etlt file, but why the size of the engine generated by deepstream is smaller for the model whose backbone is resnet10 which lead to faster inference speed?

Hi @hyperlight,
The model size is not related to the inference performance.

You can use trtexec commands as below to profile the two engines and look into why they have different perf.

$ /usr/src/tensorrt/bin/trtexec --batch=2 --dumpProfile --useSpinWait --loadEngine=yolo_mobilenet_v2_epoch_052.etlt_b2_gpu0_fp16.engine
$ /usr/src/tensorrt/bin/trtexec --batch=2 --dumpProfile --useSpinWait --loadEngine=yolo_resnet10_epoch_033.etlt_b2_gpu0_fp16.engine