Slow first inference and very slow two models inference

nimroddaniel1986 · August 1, 2022, 12:00pm

Description

I converted 2 models to TRT using TF-TRT to TRT-FP-32 and TRT-FP-16, and I see a good speedup in inference time.
Having said that, I have 2 problems:

the first inference takes time (for one model 30s, and 90s for the other) and that’s too long for my application. Is it something known in TensorRT?
The problem is specifically in this line:

    pred = infer(batch)['tf.math.sigmoid']

Is it possible to serialize a model in such way to cut this time, assuming after :

model = tf.saved_model.load(model_path, tags=[tag_constants.SERVING])
infer = model.signatures['serving_default']

Assuming TRT still has to do some optimizations before first inference?

When I run two models together in the same loop (perform prediction with one and then perform prediction with another) just to evaluate if using 2 models together does run slow, I do see very slow inference time for both models.
Some background - my application predicts an image using the first model and then doing a few predictions on the first model’s outputs using the second model.
Doing that with 2 TFTRT models resulted in a dramatic increase in inference time.
Any ideas on why this happens and how I should approach it (expect to create a new architecture that performs both stages in one architecture)?

Environment

TensorRT Version: 8.2.5.1
GPU Type: RTX 3060 (Laptop)
Nvidia Driver Version: 515
CUDA Version: running nvcc --version returns r11.7
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): 2.9.1
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorflow:22.06-tf2-py3

NVES · August 1, 2022, 12:37pm

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#measure-performance

https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#model-accuracy

Thanks!

nimroddaniel1986 · August 1, 2022, 1:02pm

My application has a visual display of the results. Using the 2 TRT models results in inference that is significantly slower than using the original models. So it’s not even about correct inference-time measurement (plus, when I perform simple predictions with the two models together outside of my application, like I mentioned above, then only the prediction time is measured, excluding pre or post processing time)

spolisetty · August 2, 2022, 5:13am

Hi,

Will get back to you on queries.
If possible, could you please share with us a minimal issue repro script/model to try from our end for better debugging.

Thank you.

Topic		Replies	Views
Tensorrt inference slower than tensorflow TensorRT	3	484	November 27, 2020
Slow inference UNet Industrial TF-TRT TensorRT tensorrt , tensorflow	1	458	July 2, 2023
The first inference using tensorRT model takes far longer time than that using tensorflow model TensorRT	0	658	November 13, 2020
Inference time using TF-TRT is the same as Native Tensorflow for Object Detection Models TensorRT tensorrt , tf-trt	4	1008	March 31, 2022
Why is TensorRT faster than TensorFlow? TensorRT	3	1619	April 26, 2022
TF-TRTModel loading time is very slow TensorRT tensorrt , tensorflow	10	1046	September 1, 2023
No performance improvement with TF-TRT optimization (ResNet50, DenseNet121) TensorRT	4	1090	June 15, 2020
How are CUDA resources allocated under dual processes? TensorRT	3	361	June 21, 2022
ONNX Model Int64 Weights TensorRT	12	13160	February 17, 2024
TensorFlow to TensorRT - Object Detection API Recommended Workflow TensorRT tensorrt , tensorflow , onnx	1	707	October 15, 2021

Slow first inference and very slow two models inference

Description

Environment

Related topics