Triton inference speed test

dbdnjswns2 · December 20, 2023, 9:33am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.4.1
GPU Type: rtx3080
Nvidia Driver Version: 535.129.03
CUDA Version: 12.2
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.1.1
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

We ran the model created through dynamic batch using triton.
We also conducted an inference speed test with 1 and 3 GPUs.
However, there is a small difference, but it is never a difference of about 3 times. What is the reason?
Even if you adjust the instance count, nothing changes.
Below is the contents of config.pbtxt when there are 3 GPUs.
name: “yolov8_16batch_dynamic”
platform: “tensorrt_plan”
max_batch_size: 0
input: [
{
name: “input”
data_type: TYPE_FP32
format: FORMAT_NONE
dims: [ -1, 3, 640, 640 ]
}
]
output: [
{
name: “boxes”,
data_type: TYPE_FP32
dims: [ -1, 8400, 4 ]
},
{
name: “scores”,
data_type: TYPE_FP32
dims: [ -1, 8400, 1 ]

},
{
    name: "classes",
    data_type: TYPE_FP32
    dims: [ -1, 8400, 1 ]
}

]
instance_group: [
{
kind: KIND_GPU
count: 8
gpus: [1]
},
{
kind: KIND_GPU
count: 8
gpus: [2]
},
{
kind: KIND_GPU
count: 8
gpus: [3]
}
]

dynamic_batching{
max_queue_delay_microseconds: 100000
}

AakankshaS · December 22, 2023, 7:55am

Hi @dbdnjswns2
We would suggest you to raise your concern to Issues · triton-inference-server/server · GitHub.
Thank you

Topic		Replies	Views
TensorRT inference result of one image don't keep the same in high qps TensorRT tensorrt	1	603	June 29, 2022
Test triton with jmeter, much less throughoutput than perf-analyzer TensorRT inference-server-triton	1	471	November 15, 2023
Inference Time When Using Multi Stream in TensorRT is Much Slower than a Single One TensorRT tensorrt	5	2467	March 30, 2023
Windows systems perfomance issue TensorRT tensorrt , inference-server-triton	1	27	April 30, 2025
Nvinferserver (Triton server) doesn't improves inference FPS for dynamic batching models DeepStream SDK	2	341	October 25, 2023
TensorRT Batching Speed scales poorly TensorRT tensorrt , cuda	6	1726	September 30, 2021
Parallel execution of several trt contexts on one GPU TensorRT onnx	1	1172	August 7, 2023
There is a difference in inference speed in TensorRT 8 TensorRT tensorrt	4	506	October 28, 2021
Tensorrt is slower than pytorch TensorRT	2	2232	September 15, 2021
BIggest Latency in TensorRT TensorRT cudnn	1	306	October 19, 2023

Triton inference speed test

Description

Environment

Relevant Files

Steps To Reproduce

Related topics