How should I interpret the reported number (ranged from 0 to 10)? Is it a linear scale?
Let’s say each kernel function corresponds to a curve with x-axis denoting time and y-axis denoting the % of busy single-precision units, and we have two kernel functions. one has utilization level 5 and the other has 10. Does that mean that the area below the curve of the former function equals to the half of the area below the curve of the latter function?
You cannot plot a graph with respect to time if you are collecting the metric with kernel mode. Since in kernel mode metric value is represented for whole kernel. It doesn’t represent the value at particular time interval. If you want to collect the metric value after some particular interval then you have to use continuous mode supported by CUPTI library(CUDA Profiling Tool Interface). Note that nvprof do not support continous mode profiling. You can find more detail about continuous mode and CUPTI library here http://docs.nvidia.com/cuda/cupti/index.html
single_precision_fu_utilzation metric gives you the idea about how good your app is utilizing the multiprocessor function compared to its peak value. Therefore I don’t see any benefit of plotting graph of 2 kernel with respect to time and comparing their area. Still it would be nice if you share your motivation behind this graph like what are you trying to achieve here. Then we might help you
Thanks for your reply. The graph argument is just my guess that how nvprof generates this metric, not how I use it. All I want to know is how should I understand the number (ranged from 0 to 10).
Let’s go back to the curve I mentioned above, I want to understand the mathematical meaning of the number, is it derived like an integral over the curve? I mean, the points in the curve are not the metric, the area is the metric, is that corrent?
Is it meaningful for a kernel to time the duration and the metric? Let’s say I have an app that use two kernels A and B, I take duration of A multiplied by utilization of A, and I do the same for kernel B. I then add the results and divide the sum by (duration A + duration B), does that gives me the average utilization of these two kernels?
By reading your query looks like you are interested in how we calculate single_precision_fu_utilization.
So formula for single_precision_fu_utilization = number of instruction executed on multiprocessor function unit / active_cycles * Max number of instruction that can be executed in single cycle on multiprocessor function unit
You can also find active_cycles taken by any kernel by passing “-e active_cycles” option to nvprof.
So, value of single_precision_fu_utilization metric shows you how good is your kernel utilizing multiprocessor function unit compared to its peak value.
So if value of metric is low it means you are not taking full advantage of multiprocessor function unit.
I have a follow up question on this. I understand the single_precision_fu_utilization formula. Just wanted to know how this number is mapped to an integer between 0-10? Is it simply multiplying by 10 and rounding off to the nearest integer?