Description
A clear and concise description of the bug or issue.
Environment
TensorRT Version: 10
GPU Type: 3060
CUDA Version: 11.3
I performed INT8 quantization on ResNet-18, and I tested the inference time using to methods:
- The command:
trtexec --onnx=resnet.onnx --int8 --saveEngine=resnet.engine --minShapes=input:1x3x224x224 --optShapes=input:1x3x224x224 --maxShapes=input:1x3x224x224
I got:
[I] GPU Compute Time: min = 0.191528 ms, max = 0.626587 ms, mean = 0.230767 ms, median = 0.223206 ms, percentile(90%) = 0.258057 ms, percentile(95%) = 0.285706 ms, percentile(99%) = 0.350342 ms
- The python code:
inputs = list(binding_addrs.values())
start = time.perf_counter()
context.execute_v2(inputs)
curr_time = time.perf_counter() - start
the input here is the valid set of ImageNet. And I got:
1.933888674217141 ms per image
There is a large gap, why is this happening?