I am trying to analyze the performance of PageRank on a NVIDIA GPU. I want to characterize this application and so as a first step I would like to measure the utilization of the Shader Core (an average would do). In other words I want to measure the percentage of time the Shader core is doing useful computation and the percentage of time the shader core has stalled. I tried nvprof but it only gives me the function-by-function split. Can you suggest some way to measure this metric ?