How to measure the time duration that GPU accelerates the tensorflow model

I have built a CNN using tensorflow with GPU support, and ran it on Jetson nano. I want to get some metrics such as CPU load, GPU load, RAM on a .csv file, and most important I want to measure the part of the code, more specifically the time duration that tensorflow accelerates my code on the validation stage. How can I achieve all these?


A simplest way is to profile your application with built-in profiler.


$ sudo /usr/local/cuda-10.2/bin/nvprof [python3]