I have a doubt regarding the inference time measurement. I have used two methods.
- clock (<time.h> Library). CPU Time.
- Elapsed Time (chrono Library).
I find difference between these two time measures. I am timing only the “context->execute” step. I am running infernce on a single GPU titan Xp. Nvidia has published inference times in the website and the amount of improvement like 49X etc. What is the time measure Nvidia uses in benchmarking the models? CPU time or the Elapsed time ?
I think this could be the answer. I personally use chrono, as time.h seems a bit inaccurate at time.
I believe the CPU side’s timing function choice is relatively trivial.
I’m not sure about your case, just remind here based on my own mistakes:
I used to forget to use cudaEventSynchronize() after Caffe’s forward() to synchronize the inference function, which dramatically underestimate Caffe’s inference time. Of course, to measure TensorRT correctly you’d sync its context execution, too.
Based on these, I have correct comparison results. Before fixing this, I had mistaken the time comparison as 2ms vs. 20+ms, which means TensorRT was 10+ times slower and seemed impossible.