Inference time mismatch between same configuration on Windows and Ubuntu

m.digiusto · September 21, 2023, 2:58pm

Description

Hi,

I’ve performed some tests to compare performances in a Windows 10 environment w.r.t a Ubuntu 22.04 one.

Software specs:

	Windows	Ubuntu
Drivers	535.98	535.86.05
CUDA	11.8	11.8
CuDNN	8.7.0	8.7.0
TensorRT	8.5.1	8.5.1

Test setup:

Windows : install drivers, cuda, cudnn and tensorrt locally;
Ubuntu: build the TensorRT container with versions shown in table

Then:

Export a yolo .pt model in TensorRT using the provided script by ultralytics (on Windows and inside the container);
Run inference on the same batch of images, with the same script and performing the same pre-post processing operations
Get inference times

What we experience is a stability in Ubuntu, with times in the range 8ms - 11ms for a batch of three images, while on Windows the time remains stable for the first runs, diverging to 40ms for the others.

Using less recent driver versions (< 470) and keeping unchanged CUDA, CuDNN and TensorRT the performances are aligned, but we loose the support for new NVIDIA features.

Are you aware of driver issues in Windows?

Thanks

Environment

TensorRT Version: 8.5.1
GPU Type: Quadro T1000
Nvidia Driver Version: 535.98 / 535.86.05
CUDA Version: 11.8
CUDNN Version: 8.7.0
Operating System + Version: Windows 10 / Ubuntu 22.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Times on Windows 10

Inference time : 0.010017156600952148
Inference time : 0.011110782623291016
Inference time : 0.007866144180297852
Inference time : 0.006834983825683594
Inference time : 0.009020566940307617
Inference time : 0.01261758804321289
Inference time : 0.008991241455078125
Inference time : 0.008015632629394531
…
…
…
Inference time : 0.052073001861572266
Inference time : 0.048119306564331055
Inference time : 0.04847455024719238
Inference time : 0.05039215087890625
Inference time : 0.04845285415649414
Inference time : 0.05358147621154785
Inference time : 0.04916524887084961
Inference time : 0.05600476264953613
Inference time : 0.059531450271606445
Inference time : 0.055614471435546875
Inference time : 0.05081486701965332
Inference time : 0.05752444267272949
Inference time : 0.052495479583740234
Inference time : 0.049124956130981445

Steps To Reproduce

Export a yolo .pt model in TensorRT using the provided script by ultralytics (on Windows and inside the container);
Run inference on the same batch of images, with the same script and performing the same pre-post processing operations
Get inference times

spolisetty · September 25, 2023, 1:57pm

Hi,

Are you facing the same issue on the latest TensorRT version 8.6.1?

Thank you.,

AakankshaS · September 27, 2023, 4:55pm

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:

Thanks!

Topic		Replies	Views
INT8 (8-bit inference, post-training quantization) on Windows 10 is much slower than Ubuntu 20.04 TensorRT	5	741	September 23, 2022
Inference time increases rapidly when set a high resolution input image TensorRT tensorrt , cuda , ubuntu	1	804	September 13, 2023
Inference Speed Spikes When Running FP16 Converted ONNX Model with TensorRT TensorRT cudnn	1	50	January 31, 2025
Inference time of tensorrt 6.3 is slower than tensorrt 6.0 TensorRT tensorrt , driveos	7	916	October 12, 2021
P6000 TensorRT too slow and the serialized fp16-model size is not as expected TensorRT tensorrt	1	459	April 4, 2023
TensorRT inference time extremely slow TensorRT	1	451	January 31, 2023
TensorRT inference time much slower than cuDNN TensorRT	3	2018	October 12, 2021
There is a difference in inference speed in TensorRT 8 TensorRT tensorrt	4	506	October 28, 2021
TensorRT Inference is Slower Than Other Frameworks TensorRT	7	3710	December 9, 2019
TensorRT inference time issues with different driver version TensorRT	1	386	September 20, 2023