buidCudaEngine is very slow

On my tx2 board, I use tensorRT to accelerate my SSD detection model. I have converted my tensorflow model to uff and inference with tensorRT. Everything works well except that the time of load model is about 3 to 5 minutes. I have checked and find that the function buildCudaEngine(*network) costs 3 to 5 minutes. How could I solve this problem?

board:tx2
os:jetpack3.2 or jetpack3.3
tensorrt version: 3.0 or 4.0

Thanks

Hello,

buildCudaEngine may take time since TensorRT is optimizing the kernel taking into account the model and GPU architecture.

To save building time, you can serialize compiled engine(PLAN) and relaunch TensorRT with PLAN.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#wkflowdiagram

Exsample PLAN serialize and de-serialize :
https://github.com/dusty-nv/jetson-inference/blob/master/tensorNet.cpp#L213

Moving to PX2 for better support coverage