Yes, I already posted my problem there last week (High Latency Variance During Inference - deployment - PyTorch Forums). Since I found out that the problem also exits using onnx runtime I figured out it might not be related to pytorch at all and decided to post here. Also I stumbled across this post (Inconsistent kernel execution times, and affected by Nsight Systems) which sounds similar.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Strange CNN inference latency behavior with CUDA and TensorRT | 13 | 1593 | January 24, 2024 | |
| Long Cuda Synchronization times in TensorRT inference (Python API) | 3 | 105 | September 1, 2025 | |
| cudaMemcpy latency unusually high on some machines | 9 | 344 | November 11, 2024 | |
| TF and Pytorch are slower on Windows than on linux | 7 | 3338 | July 2, 2019 | |
| Different slowdowns when executing models concurrently | 5 | 427 | January 4, 2021 | |
| Differences in behavior due to NVIDIA Driver | 2 | 563 | February 1, 2024 | |
| Latency when I launch a program on Tesla S2050 | 0 | 2933 | January 9, 2012 | |
| More inference time in cuda env compared to cpu (occured only for a layer) | 11 | 978 | March 7, 2022 | |
| TensorRT execution inference time occasionally increases dramatically after the warmup | 1 | 1759 | January 7, 2022 | |
| Why is torch.tensor.to('cuda') so slow? | 5 | 195 | December 9, 2024 |