YoloV3 + deepstream-app initialization time

itaowazard · November 11, 2019, 12:03pm

Hi,
Recently i get brand new Nvidia Xavier 16Gb (thx to NVidia cost policy update ^_^)
now, strait to the point
INPUT:
Nvidia Xavier 16GB, CUDA 10.0 JetPack 4.2.2 Deepstream 4.0.1
YoloV3 weights & config file downloaded with prebuild.sh shell script

GOAL
Check YoloV3 performance (compare to jetson TX2) via default (deepstreamSDK) deepstream-app utility.

RESULT
IT takes almost 5-6 MINUTES to build tensorRT engine (at least with this message processing log hangs during loading). Then, mpeg stream which set in cfg file are playing with correct speed (about 18 FPS) with correct detection. Please note, all performance configs and utils are set to MAX performance.
So question is, why yoloV3 network engine build costs that much time? (is it correct?)
YoloV2 builds much faster (about 2 minutes, but also has less layers)

Despite jetson Tx2 has almost 4-6 times less perfomance for video inferring by yoloV3 (according my tests) it took almost the same time to build network (6-8 minutes)

Any suggestions?

AastaLLL · November 12, 2019, 6:11am

Hi,

The problem here is the long building time of TensorRT engine.

In the building stage, TensorRT will evaluation each kernel’s runtime and pick a fast one.
This evaluation will go through each layer so the building time is expected to be longer for the model with more layers.
Also, both Xavier and TX2 need to evaluate all the possible kernel. The building time will be similar.

Please noticed that the engine building is an one-time job.
You can launch TensorRT directly with the serialized engine from second time.

Thanks.

itaowazard · November 12, 2019, 7:47am

Hi AastaLLL,
tnx for the fast answer.

i see, it is very pity that performance boost available for Xavier cannot avoid initialization time.