power profiling with nvprof and averages

Hello,

I wanted some clarification on nvprof and power readings.

Assume my CUDA binary is foocode with a command line argument of 10000

Now, I do the following :

nvprof --csv --system-profiling on --devices 0 --log-file human-readable-output_A.log ./foocode 10000

Now in the file human-readable-output_A.log I get :

==1202776== System profiling result:
“”,“Device”,“Count”,“Avg”,“Min”,“Max”


“Power (mW)”,“Tesla K40m (0)”,4,64493.250000,63287.000000,68015.000000

Now, is the Avg reading of 64493.250000 the average over 4 counts of sampling the TOTAL POWER of the whole execution time of the code?

I just want to confirm this.

Another question is, that if it is indeed the average over 4 counts of sampling the TOTAL POWER of the whole execution time of the code how logical is it if I run my code say 3 times and then take an average of the 3 Avg reading’s?

I am asking this because, I do plan on running my code 3 times and taking an average of the execution time, therefore I am wondering if it makes sense for me to also do the same for the Avg power reading. Would setting the sample count to 1 and then taking the average of the power reading be a better idea if that is possible?

Please note I am calculating the kernel execution time with the CUDA timers.

Thanks and Regards,

  • vihan

No, it isn’t. According to the documentation:

http://docs.nvidia.com/cuda/profiler-users-guide/index.html#system-profiling

it represents low-frequency sampling of the stated parameter. This is 4 samples of power during one execution of your app. As stated in the documentation, to see the detail for each point, combine the option with --print-gpu-trace.

The only way it could be what you state is if the profiler specifically ran your app 4 times just for the purpose of the stated collection. That is not what is happening.