Why the inference time of TensorRT enqueuev2 goes up gradually?

sms.holding.3 · December 6, 2023, 9:30am

OS : Windows 10
CUDA version : 11.5
CUDNN version : 8.4.5.1
TensorRT version: 8.4.3.1
NVIDIA GPU : 2080 Ti
NVIDIA Driver : 545.84

In my test application I run the inference process for 500 times. I expect the inference time to be stable around 1 or 2 milliseconds. This is the case for the very first 50 iterations. Regarding the fact that my test image and all the other variables stay unchanged during the execution, strangely enough I see the execution time keeps growing up to 12 milliseconds that is too much for me, since I have other operations that when they add up, the final execution time is disappointing.

This is the code snippet I use for the inference part :

    auto now = std::chrono::high_resolution_clock::now();
    bool status = mContext->enqueueV2(&mBindingDataHolder[0], *inferenceCudaStream, nullptr);
    CUDA_CHECK(cudaStreamSynchronize(*inferenceCudaStream));
    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - now);
    spdlog::info("inference time {} ms", duration.count());

And this is the log I get on my console :

[2023-12-04 13:56:07.101] [info] inference time 1 ms
[2023-12-04 13:56:07.134] [info] inference time 1 ms
[2023-12-04 13:56:07.166] [info] inference time 1 ms
[2023-12-04 13:56:07.198] [info] inference time 1 ms
[2023-12-04 13:56:07.231] [info] inference time 1 ms
[2023-12-04 13:56:07.264] [info] inference time 1 ms
[2023-12-04 13:56:07.296] [info] inference time 1 ms
[2023-12-04 13:56:07.327] [info] inference time 1 ms
[2023-12-04 13:56:07.358] [info] inference time 5 ms
[2023-12-04 13:56:07.517] [info] inference time 6 ms
[2023-12-04 13:56:07.570] [info] inference time 5 ms
[2023-12-04 13:56:07.607] [info] inference time 6 ms
[2023-12-04 13:56:07.645] [info] inference time 6 ms
[2023-12-04 13:56:07.764] [info] inference time 6 ms
[2023-12-04 13:56:07.875] [info] inference time 6 ms
[2023-12-04 13:56:07.911] [info] inference time 6 ms
[2023-12-04 13:56:07.956] [info] inference time 6 ms
[2023-12-04 13:56:08.020] [info] inference time 6 ms
[2023-12-04 13:56:08.053] [info] inference time 6 ms
[2023-12-04 13:56:08.108] [info] inference time 6 ms
[2023-12-04 13:56:08.147] [info] inference time 6 ms
[2023-12-04 13:56:08.185] [info] inference time 6 ms
[2023-12-04 13:56:08.218] [info] inference time 6 ms
[2023-12-04 13:56:08.259] [info] inference time 6 ms
[2023-12-04 13:56:08.299] [info] inference time 6 ms
[2023-12-04 13:56:08.339] [info] inference time 6 ms
[2023-12-04 13:56:08.380] [info] inference time 6 ms
[2023-12-04 13:56:08.413] [info] inference time 6 ms
[2023-12-04 13:56:08.451] [info] inference time 6 ms
[2023-12-04 13:56:08.491] [info] inference time 6 ms
[2023-12-04 13:56:08.531] [info] inference time 6 ms
[2023-12-04 13:56:08.565] [info] inference time 6 ms
[2023-12-04 13:56:08.645] [info] inference time 7 ms
[2023-12-04 13:56:08.789] [info] inference time 7 ms
[2023-12-04 13:56:08.853] [info] inference time 7 ms
[2023-12-04 13:56:08.917] [info] inference time 7 ms
[2023-12-04 13:56:08.958] [info] inference time 7 ms
[2023-12-04 13:56:09.080] [info] inference time 10 ms
[2023-12-04 13:56:09.170] [info] inference time 11 ms
[2023-12-04 13:56:09.298] [info] inference time 11 ms
[2023-12-04 13:56:09.441] [info] inference time 11 ms

I emphasize again that the test image and other variables stay unchanged during the whole process. mBindingDataHolder and inferenceCudaStream are allocated once as well.

I did a search on the Internet for similar issues. The only thing that came across advised by some people was to set cudaSetDeviceFlags (cudaDeviceScheduleBlockingSync). I’m afraid it did not help either.

AakankshaS · December 31, 2023, 10:59am

Hi /@sms.holding.3 ,
Can you please share your onnx model and repro script with us?
Thanks

Topic		Replies	Views
First inference after a pause is always long TensorRT	5	2183	August 4, 2022
the inference time increases linearly when running more than 2 tensorrt instance on single GPU TensorRT	1	1568	April 4, 2019
Inference time increases rapidly when set a high resolution input image TensorRT tensorrt , cuda , ubuntu	1	754	September 13, 2023
TensorRT enqueueV2 take a long time TensorRT cudnn	4	516	January 30, 2024
ONNX runtime prediction using GPU and with different intervals TensorRT	4	1869	January 19, 2022
TensorRT execution inference time occasionally increases dramatically after the warmup TensorRT	1	1523	January 7, 2022
Inference time becomes longer when doing non-continuous fp16 or int8 inference TensorRT tensorrt , jetson-inference	33	3189	March 30, 2023
TensorRT waiting after inference seemingly for no reason TensorRT tensorrt , cuda , performance , python	12	1466	October 20, 2022
TensorRT inference time extremely slow TensorRT	1	443	January 31, 2023
There is a difference in inference speed in TensorRT 8 TensorRT tensorrt	4	501	October 28, 2021

Why the inference time of TensorRT enqueuev2 goes up gradually?

Related topics