Description
A clear and concise description of the bug or issue.
Environment
TensorRT Version: 8.4.1
GPU Type: rtx3080
Nvidia Driver Version: 535.129.03
CUDA Version: 12.2
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.1.1
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered
We ran the model created through dynamic batch using triton.
We also conducted an inference speed test with 1 and 3 GPUs.
However, there is a small difference, but it is never a difference of about 3 times. What is the reason?
Even if you adjust the instance count, nothing changes.
Below is the contents of config.pbtxt when there are 3 GPUs.
name: “yolov8_16batch_dynamic”
platform: “tensorrt_plan”
max_batch_size: 0
input: [
{
name: “input”
data_type: TYPE_FP32
format: FORMAT_NONE
dims: [ -1, 3, 640, 640 ]
}
]
output: [
{
name: “boxes”,
data_type: TYPE_FP32
dims: [ -1, 8400, 4 ]
},
{
name: “scores”,
data_type: TYPE_FP32
dims: [ -1, 8400, 1 ]
},
{
name: "classes",
data_type: TYPE_FP32
dims: [ -1, 8400, 1 ]
}
]
instance_group: [
{
kind: KIND_GPU
count: 8
gpus: [1]
},
{
kind: KIND_GPU
count: 8
gpus: [2]
},
{
kind: KIND_GPU
count: 8
gpus: [3]
}
]
dynamic_batching{
max_queue_delay_microseconds: 100000
}