Description
Scenario 1:
- QPS is consistently stable at 300.
- 99th percentile response time (99RT) is 3ms.
- Batch size is 32.
Scenario 2:
- QPS fluctuates between 100 and 300.
- 99th percentile response time (99RT) is 10ms.
- Batch size is 32.
Why is the 99th percentile response time (99RT) higher when the QPS is below 300?
Environment
TensorRT Version: 8.6
GPU Type: NVIDIA L20
Nvidia Driver Version: 535.161.08
CUDA Version: 12.1
CUDNN Version: 8.9.7
Operating System + Version: Linux
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):