Hello,
I’m using TensorRT to speed up a deep learning model during inference, we are really care about the delay between “starting the program” and “getting the first inference result”.
According to our experiment, the “nvinfer1::createInferRuntime” and “deserializeCudaEngine” take too much time, over 3 seconds, it is extremely long compared to the inference time which is only about 20ms long.
This is the profiling for the first model:
createInferRuntime: 1300ms
deserializeCudaEngine: 2300ms
This is the profiling for the second model(Same model, but difference TensorRT instance):
createInferRuntime: 0.5ms
deserializeCudaEngine: 140ms
So I guess much of the time in createInferRuntime and deserializeCudaEngine is spent on TensorRT runtime context preparation. Is there any way to speed up the initialization? We need to get the first result in shorter time, for example, 2 seconds.
Hi,
Thanks for your response.
I can get right result from TensorRT before this, now I’m looking for a faster way to initialize our program, but TensorRT takes much time to createInferRuntime and deserializeCudaEngine.
Is there any way to speed up the initialization?
Hi,
According to our experiment, the time spent in deserializeCudaEngine(2300ms) and deserializeCudaEngine(1300ms) is independent of the TensorRT version (8.0.1.6 and 8.4.1.5).
Thanks!