hello, for tensorrt serving, my config.pbtxt is:
name: "my_model"
platform: "tensorrt_plan"
max_batch_size: 10
input [
{
name: "input_images"
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 1376, 800 ]
}
]
output [
{
name: "feature_fusion/Conv_7/Sigmoid"
data_type: TYPE_FP32
dims: [ 344, 200, 1]
}
]
instance_group [
{
kind: KIND_GPU,
count: 1
}
]
and when I use
build/perf_client -m my_model -d -c10 -l2000 -p1000 -b1 -v
to test the concurrent performance.
I get the result:
Request concurrency: 10
Pass [1] throughput: 35 infer/sec. Avg latency: 278346 usec (std 37016 usec)
Pass [2] throughput: 34 infer/sec. Avg latency: 289869 usec (std 11219 usec)
Pass [3] throughput: 35 infer/sec. Avg latency: 282968 usec (std 8233 usec)
Client:
Request count: 35
Throughput: 35 infer/sec
Avg latency: 282968 usec (standard deviation 8233 usec)
Avg HTTP time: 281752 usec (send 8178 usec + response wait 272619 usec + receive 955 usec)
Server:
Request count: 46
Avg request latency: 196643 usec (overhead 973 usec + queue 167606 usec + compute 28064 usec)
Inferences/Second vs. Client Average Batch Latency
Concurrency: 1, 21 infer/sec, latency 47625 usec
Concurrency: 2, 36 infer/sec, latency 55028 usec
Concurrency: 3, 37 infer/sec, latency 82313 usec
Concurrency: 4, 36 infer/sec, latency 110229 usec
Concurrency: 5, 38 infer/sec, latency 135002 usec
Concurrency: 6, 35 infer/sec, latency 170737 usec
Concurrency: 7, 36 infer/sec, latency 198551 usec
Concurrency: 8, 35 infer/sec, latency 230402 usec
Concurrency: 9, 36 infer/sec, latency 251738 usec
Concurrency: 10, 35 infer/sec, latency 282968 usec
Obviously, when concurrent threads increase more than 2, the latency time is linearly increasing. Is this normal? and how to decrease this latency ?
Linux distro and version:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core
other envirs:
nvcr.io/nvidia/tensorrtserver 18.11-py3
GPU type: Tesla v100
nvidia driver version: NVIDIA-SMI 410.48
CUDA version: 9.0
CUDNN version: 7.3.0