nvinfer1::ICudaEngine deserializeCudaEngine takes 40-60 sec

Hi,

On the first time after start up of a specific program, when loading the GIE model the deserializeCudaEngine function takes almost a minute.

After that, if I restarts the program (keeping the TX2 on) it takes less than 1 sec

Also, on a different program with same model and params it does not append (even after start-up)

Any ideas what to do to?

Thanks!

https://devtalk.nvidia.com/default/topic/1024533/gpu-accelerated-libraries/tensorrt-caching-mechanism-not-very-fast-deserializecudaengine-takes-some-time-/

Hi,

Please find here for information of TensorRT:
http://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#wkflowdiagram

A typical workflow is:
Neural Network -> Optimize by TensorRT -> PLAN -> Inference with TensorRT

If PLAN is not available in your environment, TensorRT needs to compile the given model (caffe or UFF) into PLAN first.
This step takes minutes for optimizing the network based on the GPU architecture.

Once the PLAN is created, you can use it for inferencing next time.
In this workflow, only de-serialize time is needed for launching a TensorRT engine.

Please let us know if we have answered your question correctly.
Thanks.

Hi,

Thank you for responding, however, it does not answer my question.

Let me try to rephrase using better terms:

When I wrote “loading the GIE model” I meant loading the serialized PLAN.

I have optimized my model offline and saved the serialized PLAN to the disk
I then run an applications that loads this PLAN and calls nvinfer1::ICudaEngine deserializeCudaEngine, the call takes 40-60 sec (just the deserializeCudaEngine)
If I close the application and run it again, it takes about 1 sec, meaning there is a difference between the first ever call (after turning on the system) and all other calls

Thanks!

Hi,

Could you do a simple experiment to help us know more about your issue?
1. Run application (1st)
2. Run application (2nd)
3. rm -rf ~/.nv and run application (3rd)
Please share the execution time for de-serializing engine with us.

Please remember that TensorRT PLAN is not portable among different GPU devices.
TensorRT optimizes PLAN based on the GPU architecture and should not be used cross-platform.

Thanks.

Hi,

I did the experiment, deleting ~/.nv had no inflorescence on the execution time

  1. 1st run about 40 Sec
  2. 2nd run about 1 sec
  3. 3rd run about 1 sec

Regarding your comment, the PLAN is generated on the same device it is load on

Thanks

Hi,

We want to reproduce this issue internally.

Could you try if this issue can be reproduced on our native model?
/usr/src/tensorrt/data/googlenet/googlenet.prototxt
/usr/src/tensorrt/data/mnist/mnist.prototxt
And official samples located at ‘/usr/src/tensorrt/samples/’?

Thanks.

Hi,

I ran a few samples.
The processing time of 1st and all other runs is about the same.

Thanks

Hi,

The reason is JIT compiling.

In first launch, application need to generate PTX code first.
It takes some times and will cause a slightly delay before executing.

Thanks.

Hi,

Is there anything I can do to resolve the this delay?

Thanks

Hi,

Avoid the first launch should be fine.
Thanks.