A100 graphics card inference performance is not strong


Client: Docker Engine - Community
Version: 20.10.7
API version: 1.41
Go version: go1.13.15
Git commit: f0df350
Built: Wed Jun 2 11:58:10 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Version: 20.10.0
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: eeddea2
Built: Tue Dec 8 18:56:55 2020
OS/Arch: linux/amd64
Experimental: false
Version: 1.4.6
GitCommit: d71fcd7d8303cbf684402823e425e9dd2e99285d
Version: 1.0.1
GitCommit: v1.0.1-0-g4144b63
Version: 0.19.0
GitCommit: de40ad0

TensorRT Version:
GPU Type: A100 40g
Nvidia Driver Version: 465
CUDA Version: 11.0.194
CUDNN Version: 8.0.1
Operating System + Version: ubuntu18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.07-py3

Also in docker images, for the same code tensorrt acceleration, my test time on the graphics card 2080 is less than the graphics card A100。
In mig mode, the respective tensorrt programs still affect the time-consuming seriously


We recommend you to please use the latest TensorRT container.

Thank you.

i used the latest TensorRT container(22.02), it doesn’t improve much

Could you please share with us more details like the issue repro model, scripts, and output logs for better debugging.

Thank you.


Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details: