The inference time?

lnnnnnnn · July 13, 2024, 3:53am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 10
GPU Type: 3060
CUDA Version: 11.3

I performed INT8 quantization on ResNet-18, and I tested the inference time using to methods:

The command:

trtexec --onnx=resnet.onnx --int8 --saveEngine=resnet.engine --minShapes=input:1x3x224x224 --optShapes=input:1x3x224x224 --maxShapes=input:1x3x224x224

I got:

[I] GPU Compute Time: min = 0.191528 ms, max = 0.626587 ms, mean = 0.230767 ms, median = 0.223206 ms, percentile(90%) = 0.258057 ms, percentile(95%) = 0.285706 ms, percentile(99%) = 0.350342 ms

The python code:

inputs = list(binding_addrs.values())
start = time.perf_counter()
context.execute_v2(inputs)
curr_time = time.perf_counter() - start

the input here is the valid set of ImageNet. And I got:

1.933888674217141 ms per image

There is a large gap, why is this happening?