On linux, it uses gettimeofday, which is good to 10 us. On windows, I have no clue. You can read the source code for it yourself in the nvidida cuda sdk. For thesis quality work, you should be averaging many runs with 10’s of seconds to minutes of sampling, no?
I know… itis exactly what i tried to do, but without knowing the tolerance of my analyser, i would not get any further… But now I have something to calculate with… :-)