platform: vs2010+tesla c2050
i run a test program on tesla c2050, and use visual profiler to analyze the program. the program have 3 memory copies between cpu and gpu and 1 kernel function. Visual Profiler analysis result are as follows:
Kernel time = 0.51 % of total GPU time
Memory copy time = 36.2 % of total GPU time
Kernel taking maximum time = VecAdd (0.5% of total GPU time)
Memory copy taking maximum time = memcpyHtoD (25.0% of total GPU time)
There is no time overlap between memory copies and kernels on GPU
my question is, why the Kernel time+Memory copy time does not equal 100% of GPU time?