Is visual profiler reliable to time our program?

Hi,
I used both visual profiler and a timer inside the program to time my opencl implementation.I am a little confused on following issues:

  1. cpu time=gpu time+overhead?
    So for kernels,gpu time is the actual core excution time?While, in my application,cpu time of profiler is closer to the result of program timer~

2)In my application:
memcpyDtoHasync: gpu time :237.024 cpu time:4.863
memcpyHtoDasync: gpu time :106925 cpu time:5367.52

For transfer, the actual time should be cpu time+ gpu time? Or only gpu time?

  1. Although the average elapsed time is consistent with program timer, the #calls is not always right?!

It’s very convenient to use visual profiler to time our programs ,but is it reliable?Any ideas?

Thanks in advance!