TensorRT nvinfer1::ICudaEngine deserializeCudaEngine not fast

Description

Hello,
I’m using TensorRT to speed up a deep learning model during inference, we are really care about the delay between “starting the program” and “getting the first inference result”.

According to our experiment, the “nvinfer1::createInferRuntime” and “deserializeCudaEngine” take too much time, over 3 seconds, it is extremely long compared to the inference time which is only about 20ms long.

This is the profiling for the first model:

createInferRuntime: 1300ms

deserializeCudaEngine: 2300ms

This is the profiling for the second model(Same model, but difference TensorRT instance):

createInferRuntime: 0.5ms

deserializeCudaEngine: 140ms

So I guess much of the time in createInferRuntime and deserializeCudaEngine is spent on TensorRT runtime context preparation. Is there any way to speed up the initialization? We need to get the first result in shorter time, for example, 2 seconds.

Thanks!

Environment

TensorRT Version: 8.0.1.6
Type: Xavier NX
Jetpack Version: 4.6 [L4T 32.6.1]
CUDA Version: 10.2.300
CUDNN Version: 8.2.1.32

Relevant Files

like as follows:

Hi,

This looks like a Jetson issue. Please refer to the below samples in case useful.

For any further assistance, we will move this post to to Jetson related forum.

Thanks!

Hi,
Thanks for your response.
I can get right result from TensorRT before this, now I’m looking for a faster way to initialize our program, but TensorRT takes much time to createInferRuntime and deserializeCudaEngine.
Is there any way to speed up the initialization?

Thanks!

Hi,

Could you please let us know how big the model is?
Also please try on the latest TensorRT version and let us know if you still face this issue.

Thank you.

Hi,

According to our experiment, the time that deserializeCudaEngine take in first times, have no relevance to mode size.

This is the profiling for two models(1st):
deserializeCudaEngine: 2300ms(model size: 26MB)
deserializeCudaEngine: 2300ms(model size: 11MB)

This is the profiling for two models(2nd):
deserializeCudaEngine: 140ms(model size: 26MB)
deserializeCudaEngine: 68ms(model size: 11MB)

Do you have the same problem?

Thanks!

Thank you for sharing.
Also please try on the latest TensorRT version and let us know if you still face this issue for better guidance.

Hi,
According to our experiment, the time spent in deserializeCudaEngine(2300ms) and deserializeCudaEngine(1300ms) is independent of the TensorRT version (8.0.1.6 and 8.4.1.5).
Thanks!