I’m using TensorRT to speed up a deep learning model during inference, we are really care about the delay between “starting the program” and “getting the first inference result”.
According to our experiment, the “nvinfer1::createInferRuntime” and “deserializeCudaEngine” take too much time, over 3 seconds, it is extremely long compared to the inference time which is only about 20ms long.
This is the profiling for the first model:
This is the profiling for the second model(Same model, but difference TensorRT instance):
So I guess much of the time in createInferRuntime and deserializeCudaEngine is spent on TensorRT runtime context preparation. Is there any way to speed up the initialization? We need to get the first result in shorter time, for example, 2 seconds.
Thanks for your response.
I can get right result from TensorRT before this, now I’m looking for a faster way to initialize our program, but TensorRT takes much time to createInferRuntime and deserializeCudaEngine.
Is there any way to speed up the initialization?