Using TensorRT for model inference, does a stable QPS (Queries Per Second) have a significant impact on the prediction response time (RT)?

navyzhou189 · October 28, 2024, 2:14am

Description

Scenario 1:

QPS is consistently stable at 300.
99th percentile response time (99RT) is 3ms.
Batch size is 32.

Scenario 2:

QPS fluctuates between 100 and 300.
99th percentile response time (99RT) is 10ms.
Batch size is 32.

Why is the 99th percentile response time (99RT) higher when the QPS is below 300?

Environment

TensorRT Version: 8.6
GPU Type: NVIDIA L20
Nvidia Driver Version: 535.161.08
CUDA Version: 12.1
CUDNN Version: 8.9.7
Operating System + Version: Linux
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

AakankshaS · November 29, 2024, 7:16am

Hi @navyzhou189 ,
Can you pls elaborate your issue?

Thanks

Topic		Replies	Views
TensorRT inference time extremely slow TensorRT	1	449	January 31, 2023
Tensorrt inference time fluctuated when test a big model TensorRT tensorrt	2	678	June 4, 2021
Optimization using Inference batch size General Topics and Other SDKs	1	1016	January 19, 2022
Is the inference cost time affected by the frequency of calls? TensorRT	2	363	November 25, 2020
Inference Speed Spikes When Running FP16 Converted ONNX Model with TensorRT TensorRT cudnn	1	39	January 31, 2025
P6000 TensorRT too slow and the serialized fp16-model size is not as expected TensorRT tensorrt	1	450	April 4, 2023
Tensorrt test time is not stable TensorRT tensorrt	2	482	September 20, 2022
Inference on large batch size TensorRT	5	4594	September 21, 2018
Why the inference time of TensorRT enqueuev2 goes up gradually? TensorRT	1	444	December 31, 2023
TensorRT Inconsistent Inference Performance with Python and Trtexec TensorRT tensorrt , cuda , jetson-inference , python , cudnn	0	311	April 2, 2024

Using TensorRT for model inference, does a stable QPS (Queries Per Second) have a significant impact on the prediction response time (RT)?

Description

Environment

Related topics