i work on nvidia nx width c++.
i found that was too long to run the functions ( “createInferRuntime”,“deserializeCudaEngine”) at the first time.
is there any way to lat it faster?
Hi @fanyj233,
Please refer the following doc, Best Practices For TensorRT Performance
We recommend you to provide more details of issue and reproducible model/scripts.
Thank you.
Hi @spolisetty @fanyj233
I’m struggled with this problem for a long time. TensorRT will still cost 8~10s for deserializing engine and creating context. I find some previous discusions, but these replies cannot solve my problem exactly. I think it’s certainly a performance issue for TensorRT using in realtime case.
ref: nvinfer1::ICudaEngine deserializeCudaEngine takes 40-60 sec
ref: TensorRT Caching mechanism not very fast. deserializeCudaEngine takes some time