No.
trtexec do “continuous” inference additionally skip serveral premier frames so I haven’t seen any delay effect.
Could you please share minimum issue repro script you’re using.
Thank you.
It had been shared on Aug 3. Please check the above reply.
Sorry we couldn’t find inference script you’re using in 7z file you’ve shared. Please provide us inference script with sample to reproduce the issue.
Thank you.
Hi @spolisetty
Please download profiler.7z.001 - profiler.7z.004 and uncompress it.
cd profiler/build
./profiler
Hi,
Sorry for the delayed response. Are you still facing this issue
Hi, I have the same problem
for i in range(N):
start_time = time.time()
pred_onnx = sess.run(None, feed)
time_diff = (time.time() - start_time) * 1000
print("execution time: ", time_diff)
time.sleep(1)
If I dont have the time.sleep(1). the inference is consistently at 3-5ms, but if I have the time.sleep(1) here, the inference time will become 10-20ms!
This is onnx runtime based on CUDA. I believe it’s not model dependent, you can select any model and try to repeatedly run with some sleep in between to see the effect of ‘non-continuity’
@spolisetty
Yes.
It will be very helpful if you could share more about “warm-up” mechanism, so that we can check whether we could avoid the “cold-down” and “warm-up again” effects.
I guess it will also benefit a lot of other people.
Hi, do you have any insight or solution for this? The inference time increase drastically from 4ms to 30ms when there’s a 100ms sleep in between, I believe this is a very common problem with CUDA.
Hi, are you still around?
Hi, do you have any advice on this? Thanks
Hi @spolisetty we still cannot solve this problem, would you mind sharing your advice on this?
Thank you
Hi, its me again, may I know if you are still checking on this?
Thanks
Hi, are you guys still around? Thanks
Hi @hnamletran,
Could you please create a post with clear issue details and issue repro script/onnx model for better debugging.
Thank you.
Hi, I have posted a new topic here, thanks
https://forums.developer.nvidia.com/t/first-inference-after-a-pause-is-always-long/200950