I am confused by the timing of CUDA global functions. I used both clock() and CUDA Visual Profiler to time. The result of clock() is 1078048. But the result of the Visual Profiler is 687693.
Apparently, they are different. If the clock() counts the processor steps, then the time in seconds should be calculated by dividing the frequency. But it’s still different from the Visual Profiler. So what is the unit of Visual Profiler GPU time any way?? millisecond? microsecond?