Is there any way to roughly estimate how many operations per second a CUDA program is executing using the profiler output ? (too lazy to count manually) I understand that the counters only target one of the multiprocessors at a time, but am curious if it’s possible to roughly compute total operations per second as a function of gridSize, blockSize, instructions, and time stamp?
Also, when comparing the ‘cputime, gputime, and timestamp’ counters, the difference between adjacent timestamps seems to be roughly 200usec + cputime. Is this an overhead caused by the profiling activity? Or, is there some additional kernel launch overhead that is not captured in the cputime counter?
Finally, do the detailed profiling counters not work on the GTX280? For whatever reason, when I try to set the CUDA_PROFILE_CONFIG environment variable, I get no output with a GTX280, but it works ok with a C870 or GTX8800. Thanks,