I used both visual profiler and a timer inside the program to time my opencl implementation.I am a little confused on following issues:
- cpu time=gpu time+overhead?
So for kernels,gpu time is the actual core excution time?While, in my application,cpu time of profiler is closer to the result of program timer~
2)In my application:
memcpyDtoHasync: gpu time :237.024 cpu time:4.863
memcpyHtoDasync: gpu time :106925 cpu time:5367.52
For transfer, the actual time should be cpu time+ gpu time? Or only gpu time?
- Although the average elapsed time is consistent with program timer, the #calls is not always right?!
It’s very convenient to use visual profiler to time our programs ,but is it reliable?Any ideas?
Thanks in advance!