Sorry we couldn’t find inference script you’re using in 7z file you’ve shared. Please provide us inference script with sample to reproduce the issue.
Thank you.
Sorry we couldn’t find inference script you’re using in 7z file you’ve shared. Please provide us inference script with sample to reproduce the issue.
Thank you.
Hi @spolisetty
Please download profiler.7z.001 - profiler.7z.004 and uncompress it.
cd profiler/build
./profiler
Hi,
Sorry for the delayed response. Are you still facing this issue
Hi, I have the same problem
for i in range(N):
start_time = time.time()
pred_onnx = sess.run(None, feed)
time_diff = (time.time() - start_time) * 1000
print("execution time: ", time_diff)
time.sleep(1)
If I dont have the time.sleep(1). the inference is consistently at 3-5ms, but if I have the time.sleep(1) here, the inference time will become 10-20ms!
This is onnx runtime based on CUDA. I believe it’s not model dependent, you can select any model and try to repeatedly run with some sleep in between to see the effect of ‘non-continuity’
@spolisetty
Yes.
It will be very helpful if you could share more about “warm-up” mechanism, so that we can check whether we could avoid the “cold-down” and “warm-up again” effects.
I guess it will also benefit a lot of other people.
Hi, do you have any insight or solution for this? The inference time increase drastically from 4ms to 30ms when there’s a 100ms sleep in between, I believe this is a very common problem with CUDA.
Hi, are you still around?
Hi, do you have any advice on this? Thanks
Hi @spolisetty we still cannot solve this problem, would you mind sharing your advice on this?
Thank you
Hi, its me again, may I know if you are still checking on this?
Thanks
Hi, are you guys still around? Thanks
Hi @hnamletran,
Could you please create a post with clear issue details and issue repro script/onnx model for better debugging.
Thank you.
Hi, I have posted a new topic here, thanks
https://forums.developer.nvidia.com/t/first-inference-after-a-pause-is-always-long/200950
Hi @hnamletran , Were you able to find any solution for this one? I am facing same issue with hifigan fp16 onnx model. Inference after a pause is taking 10x time than continues inference.
any updates?
I have the same problem with Yolo V5 (PyTorch), CUDA 11.6, CUDNN 8.3.2 and a .pt model.
With my RTX A2000 i get 8ms inference when i deliver images in a loop.
When i insert a pause of 1s between the inferences, the time goes up to ~90ms.
Independent whether FP16 or FP32.
Problem does not occur with .onnx models.
Is there any solution for that?