Power profiling CUDA on the Jetson Tk1


Let me first of all apologize if this questions have obvious answers: I have only started working on CUDA less then 3 weeks ago, and I have been unable to find the information I was looking for online.

I have been working on porting some processing algorithm on the Jetson Tk1 to exploit CUDA on an embedded platform.

I have them running, as as I optimize the processing kernels I wrote I am seeing very good performances.

I now need to understand what is the actual board usage and power consumption related to my algorithms only. I was unable to find a compatible power profiling tool, and I don’t know how to relate measurements of the current drawn from the board to the actual consumption related to my algorithms only. Could anyone pitch in and recommend something there?

Another question I have is related to how the performance would scale. I can tweak the parameters and data I feed to the algorithms to achieve different performances, but:

  • I see the same power being drawn from my supply when running in different conditions (parameters and data)
  • Using nvprof and importing its output in the visual profiler, it looks like that everytime I launch a kernel it tries to use 100% of the GPU resources.
    Is that normal? If so, why? If not, how do I correct this?

Thanks for all the help you may be able to provide.



Has anyone faced similar questions to the ones I posted above?
Can anyone provide recommendations or ideas?


If you would like to know the relation between measurement of current and actual consumption of you algorithms, I suggest to leverage system performance for instead to optimize you algorithms. There are four major parts for system power : CPU, CORE, GPU, EMC , and the perf(clk/power) of these parts would change dynamically based on their loading .

You can execute tegrastats under home/ubuntu to check perf and loading of cpu, emc, gpu(gr3d) and check d/clock/dvfs for core infor.
If you focus on gpu performance , there are some file node you can check and tune like /d/clock/gpu_dvfs_t
and d/clock/gk20a.0_scaling . Of course , you can disable gpu scaling (/d/clock/override.gbus/state) to monitor the gpu loading for loading investigation.