Meaning of "single_precision_fu_utilization" metric

serailhydra · November 21, 2017, 6:54pm

How should I interpret the reported number (ranged from 0 to 10)? Is it a linear scale?

Let’s say each kernel function corresponds to a curve with x-axis denoting time and y-axis denoting the % of busy single-precision units, and we have two kernel functions. one has utilization level 5 and the other has 10. Does that mean that the area below the curve of the former function equals to the half of the area below the curve of the latter function?

veraj · November 22, 2017, 7:55am

Hi, serailhydra

All metrics of type utilization are assigned an integer from 1 to 10.
A description text is also allocated based on ranges.
[0] Idle
[1,3] Low
[4,6] Mid
[7,9] High
[10] Max

So you may get Low(1) or High(8) result

serailhydra · November 22, 2017, 1:20pm

I already the utilization is in integer, but can you read my question to see what I am asking? Just knowing the number and Low/Mid/High is not enough for me.

veraj · November 23, 2017, 2:51am

The explanation of this metric is

The utilization level of the multiprocessor function units that execute single-precision floating-point instructions and integer instructions.

Anyway, I have posted your question to dev, hope can get more details.

Best Regards

VeraJ

SagarAgrawal · November 24, 2017, 6:46am

You cannot plot a graph with respect to time if you are collecting the metric with kernel mode. Since in kernel mode metric value is represented for whole kernel. It doesn’t represent the value at particular time interval. If you want to collect the metric value after some particular interval then you have to use continuous mode supported by CUPTI library(CUDA Profiling Tool Interface). Note that nvprof do not support continous mode profiling. You can find more detail about continuous mode and CUPTI library here CUPTI :: CUDA Toolkit Documentation

single_precision_fu_utilzation metric gives you the idea about how good your app is utilizing the multiprocessor function compared to its peak value. Therefore I don’t see any benefit of plotting graph of 2 kernel with respect to time and comparing their area. Still it would be nice if you share your motivation behind this graph like what are you trying to achieve here. Then we might help you

serailhydra · November 24, 2017, 12:50pm

You cannot plot a graph with respect to time if you are collecting the metric with kernel mode. Since in kernel mode metric value is represented for whole kernel. It doesn’t represent the value at particular time interval. If you want to collect the metric value after some particular interval then you have to use continuous mode supported by CUPTI library(CUDA Profiling Tool Interface). Note that nvprof do not support continous mode profiling. You can find more detail about continuous mode and CUPTI library here CUPTI :: CUDA Toolkit Documentation

single_precision_fu_utilzation metric gives you the idea about how good your app is utilizing the multiprocessor function compared to its peak value. Therefore I don’t see any benefit of plotting graph of 2 kernel with respect to time and comparing their area. Still it would be nice if you share your motivation behind this graph like what are you trying to achieve here. Then we might help you

Thanks for your reply. The graph argument is just my guess that how nvprof generates this metric, not how I use it. All I want to know is how should I understand the number (ranged from 0 to 10).

Let’s go back to the curve I mentioned above, I want to understand the mathematical meaning of the number, is it derived like an integral over the curve? I mean, the points in the curve are not the metric, the area is the metric, is that corrent?

Is it meaningful for a kernel to time the duration and the metric? Let’s say I have an app that use two kernels A and B, I take duration of A multiplied by utilization of A, and I do the same for kernel B. I then add the results and divide the sum by (duration A + duration B), does that gives me the average utilization of these two kernels?

SagarAgrawal · November 27, 2017, 4:34am

By reading your query looks like you are interested in how we calculate single_precision_fu_utilization.

So formula for single_precision_fu_utilization = number of instruction executed on multiprocessor function unit / active_cycles * Max number of instruction that can be executed in single cycle on multiprocessor function unit

You can also find active_cycles taken by any kernel by passing “-e active_cycles” option to nvprof.

So, value of single_precision_fu_utilization metric shows you how good is your kernel utilizing multiprocessor function unit compared to its peak value.

So if value of metric is low it means you are not taking full advantage of multiprocessor function unit.

serailhydra · November 27, 2017, 2:58pm

Thanks! This answers exactly my question.

anandj91 · June 7, 2018, 2:22am

I have a follow up question on this. I understand the single_precision_fu_utilization formula. Just wanted to know how this number is mapped to an integer between 0-10? Is it simply multiplying by 10 and rounding off to the nearest integer?