Hi

I would like to know how the value of average is calculated by “nsys profile --trace=cuda” because with nv-nsight-cu-cli I am not able to reproduce that.

For example, I see

```
execute3DconvolutionCuda_split(float*, float*, float*, float*, int, int, int, int, int, int, int, int, int), Block Size 1024, Grid Size 64, Device 0, 6 invocations
Section: Command line profiler metrics
Metric Name Metric Unit Minimum Maximum Average
-------------------------------------- ----------- ---------------- ----------------- -----------------
gpu__time_active.avg msecond 0.242144 2.102880 1.409056
execute3DconvolutionCuda_split(float*, float*, float*, float*, int, int, int, int, int, int, int, int, int), Block Size 1024, Grid Size 256, Device 0, 4 invocations
Section: Command line profiler metrics
Metric Name Metric Unit Minimum Maximum Average
-------------------------------------- ----------- ----------------- ----------------- -----------------
gpu__time_active.avg usecond 927.776000 947.072000 933.528000
```

and

```
Time(%) Total Time (ns) Instances Average Minimum Maximum Name
------- --------------- --------- ----------- --------- --------- ----------------------------------------------------------------------------------------------------
31.4 12,513,663 10 1,251,366.3 252,750 2,158,617 execute3DconvolutionCuda_split(float*, float*, float*, float*, int, int, int, int, int, int, int, i…
```

As you can see the total number of invocations from nsight compute (6+4) is the same that nsys (10). The minimum and maximum are reasonable. Manually computing the average, I use weighted average which shows

Weighted average:

(6*1409 + 4*933.5)/10 = 1015.6 useconds

Even with other methods (no matter if they are meaningful or not in this example), we see:

1- Arithmetic average:

(1409+933.5)/2 = 1171.25 useconds

2- Weighted harmonic mean:

(6+4)/((6/1409)+(4/933.5)) = 1170.5

3- Weighted geometric mean:

(1409^6 * 933.5^4) ^ (1/(6+4)) = 1195 useconds

However, nsys says 1251 useconds.

Although some differences are reasonable, I am not sure if 200 useconds shows that nsys calculates the average by another way.

Any thought on that?