Inference time mismatch between same configuration on Windows and Ubuntu

Description

Hi,

I’ve performed some tests to compare performances in a Windows 10 environment w.r.t a Ubuntu 22.04 one.

Software specs:

Windows Ubuntu
Drivers 535.98 535.86.05
CUDA 11.8 11.8
CuDNN 8.7.0 8.7.0
TensorRT 8.5.1 8.5.1

Test setup:

  • Windows : install drivers, cuda, cudnn and tensorrt locally;
  • Ubuntu: build the TensorRT container with versions shown in table

Then:

  1. Export a yolo .pt model in TensorRT using the provided script by ultralytics (on Windows and inside the container);
  2. Run inference on the same batch of images, with the same script and performing the same pre-post processing operations
  3. Get inference times

What we experience is a stability in Ubuntu, with times in the range 8ms - 11ms for a batch of three images, while on Windows the time remains stable for the first runs, diverging to 40ms for the others.

Using less recent driver versions (< 470) and keeping unchanged CUDA, CuDNN and TensorRT the performances are aligned, but we loose the support for new NVIDIA features.

Are you aware of driver issues in Windows?

Thanks

Environment

TensorRT Version: 8.5.1
GPU Type: Quadro T1000
Nvidia Driver Version: 535.98 / 535.86.05
CUDA Version: 11.8
CUDNN Version: 8.7.0
Operating System + Version: Windows 10 / Ubuntu 22.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Times on Windows 10

Inference time : 0.010017156600952148
Inference time : 0.011110782623291016
Inference time : 0.007866144180297852
Inference time : 0.006834983825683594
Inference time : 0.009020566940307617
Inference time : 0.01261758804321289
Inference time : 0.008991241455078125
Inference time : 0.008015632629394531



Inference time : 0.052073001861572266
Inference time : 0.048119306564331055
Inference time : 0.04847455024719238
Inference time : 0.05039215087890625
Inference time : 0.04845285415649414
Inference time : 0.05358147621154785
Inference time : 0.04916524887084961
Inference time : 0.05600476264953613
Inference time : 0.059531450271606445
Inference time : 0.055614471435546875
Inference time : 0.05081486701965332
Inference time : 0.05752444267272949
Inference time : 0.052495479583740234
Inference time : 0.049124956130981445

Steps To Reproduce

  1. Export a yolo .pt model in TensorRT using the provided script by ultralytics (on Windows and inside the container);
  2. Run inference on the same batch of images, with the same script and performing the same pre-post processing operations
  3. Get inference times

Hi,

Are you facing the same issue on the latest TensorRT version 8.6.1?

Thank you.,

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:

Thanks!