Inference time becomes longer when doing non-continuous fp16 or int8 inference

No.
trtexec do “continuous” inference additionally skip serveral premier frames so I haven’t seen any delay effect.

Could you please share minimum issue repro script you’re using.

Thank you.

It had been shared on Aug 3. Please check the above reply.

Sorry we couldn’t find inference script you’re using in 7z file you’ve shared. Please provide us inference script with sample to reproduce the issue.

Thank you.

profiler.7z.001 (10 MB)

profiler.7z.002 (10 MB)

profiler.7z.003 (10 MB)

profiler.7z.004 (4.7 MB)

Hi @spolisetty

Please download profiler.7z.001 - profiler.7z.004 and uncompress it.
cd profiler/build
./profiler

Hi,

Sorry for the delayed response. Are you still facing this issue

Hi, I have the same problem

for i in range(N):
start_time = time.time()
pred_onnx = sess.run(None, feed)
time_diff = (time.time() - start_time) * 1000
print("execution time: ", time_diff)
time.sleep(1)
If I dont have the time.sleep(1). the inference is consistently at 3-5ms, but if I have the time.sleep(1) here, the inference time will become 10-20ms!

This is onnx runtime based on CUDA. I believe it’s not model dependent, you can select any model and try to repeatedly run with some sleep in between to see the effect of ‘non-continuity’

@spolisetty
Yes.
It will be very helpful if you could share more about “warm-up” mechanism, so that we can check whether we could avoid the “cold-down” and “warm-up again” effects.
I guess it will also benefit a lot of other people.

Hi, do you have any insight or solution for this? The inference time increase drastically from 4ms to 30ms when there’s a 100ms sleep in between, I believe this is a very common problem with CUDA.

Hi, are you still around?

Hi, do you have any advice on this? Thanks

Hi @spolisetty we still cannot solve this problem, would you mind sharing your advice on this?
Thank you

Hi, its me again, may I know if you are still checking on this?

Thanks

Hi, are you guys still around? Thanks

Hi @hnamletran,

Could you please create a post with clear issue details and issue repro script/onnx model for better debugging.

Thank you.

Hi, I have posted a new topic here, thanks
https://forums.developer.nvidia.com/t/first-inference-after-a-pause-is-always-long/200950