buidCudaEngine is very slow

On my tx2 board, I use tensorRT to accelerate my SSD detection model. I have converted my tensorflow model to uff and inference with tensorRT. Everything works well except that the time of load model is about 3 to 5 minutes. I have checked and find that the function buildCudaEngine(*network) costs 3 to 5 minutes. How could I solve this problem?

os:jetpack3.2 or jetpack3.3
tensorrt version: 3.0 or 4.0



buildCudaEngine may take time since TensorRT is optimizing the kernel taking into account the model and GPU architecture.

To save building time, you can serialize compiled engine(PLAN) and relaunch TensorRT with PLAN.

Exsample PLAN serialize and de-serialize :

Moving to PX2 for better support coverage