Hi,
I would like to know how “nsys stats” reports the GPU time percentage number for kernels. Please see the following line:
Time(%) Total Time (ns) Instances Average Minimum Maximum Name
------- --------------- --------- --------- ------- ------- ----------------------------------------------------------------------------------------------------
51.8 1,854,648,457 4,554 407,257.0 332,256 436,716 nbnxn_kernel_ElecEw_VdwLJFsw_F_cuda(cu_atomdata, cu_nbparam, Nbnxm::gpu_plist, bool)
So, it is 51.8%.
Now, looking at the picture below
https://pasteboard.co/Jr3CvTo.jpg
which is the output of nsys-ui, I guess the total time calculation should be
0.9990.5750.745*0.882=0.377
or 37.7%.
Am I right? What is missing here?
This is saying that the executions of this kernel represent 51.8% of the execution time of all the things in this report.
The help explains it better:
Note that the “Time(%)” column is calculated using a summation of the “Total Time” column, and represents that API call’s, kernel’s, or memory operation’s percent of the execution time of the APIs, kernels and memory operations listed, and not a percentage of the application wall or CPU execution time.
Here is the full help output:
$ nsys stats --help-reports apigpusum
apigpusum[:base] -- CUDA API & GPU Summary (CUDA API + kernels + memory ops)
base - Optional argument, if given, will cause summary to be over the
base name of the kernel, rather than the templated name.
Output: All time values given in nanoseconds
Time(%) : Percentage of "Total Time"
Total Time : The total time used by all executions of this kernel
Instances: The number of executions of this object
Average : The average execution time of this kernel
Minimum : The smallest execution time of this kernel
Maximum : The largest execution time of this kernel
Category : The category of the operation
Operation : The name of the kernel
This report provides a summary of CUDA API calls, kernels and memory
operations, and their execution times. Note that the "Time(%)"
column is calculated using a summation of the "Total Time" column,
and represents that API call's, kernel's, or memory operation's
percent of the execution time of the APIs, kernels and memory
operations listed, and not a percentage of the application wall or
CPU execution time.
This report combines data from the "cudaapisum", "gpukernsum", and
"gpumemsizesum" reports. It is very similar to profile section of
"nvprof --dependency-analysis".
But the difference between that and what I see in the nsys-ui is large. nsys-ui also says a kernel takes X percents in a stream and that stream takes Y percent and …