Problem I have a resnet that I want to apply in a loop for a real time application. During deployment I noticed that the time it takes for applying my model is very inconsistent. At first it only takes ~2ms but after some time it sometimes even spikes up to ~12ms. I tested it using pytorch (python)…

High Latency Variance During Inference

seidl1 April 24, 2024, 2:41pm 4

Yes, I already posted my problem there last week (High Latency Variance During Inference - deployment - PyTorch Forums). Since I found out that the problem also exits using onnx runtime I figured out it might not be related to pytorch at all and decided to post here. Also I stumbled across this post (Inconsistent kernel execution times, and affected by Nsight Systems) which sounds similar.

Topic		Replies	Views
Strange CNN inference latency behavior with CUDA and TensorRT TensorRT cuda	13	1593	January 24, 2024
Long Cuda Synchronization times in TensorRT inference (Python API) TensorRT tensorrt , cuda , python , cudnn	3	105	September 1, 2025
cudaMemcpy latency unusually high on some machines CUDA Programming and Performance	9	344	November 11, 2024
TF and Pytorch are slower on Windows than on linux CUDA Programming and Performance	7	3338	July 2, 2019
Different slowdowns when executing models concurrently CUDA Programming and Performance	5	427	January 4, 2021
Differences in behavior due to NVIDIA Driver cuDNN python	2	563	February 1, 2024
Latency when I launch a program on Tesla S2050 CUDA Programming and Performance	0	2933	January 9, 2012
More inference time in cuda env compared to cpu (occured only for a layer) CUDA Programming and Performance	11	978	March 7, 2022
TensorRT execution inference time occasionally increases dramatically after the warmup TensorRT	1	1759	January 7, 2022
Why is torch.tensor.to('cuda') so slow? Jetson AGX Orin pytorch	5	195	December 9, 2024