How to Understand output of nvprof?

useruser · June 3, 2015, 10:35am

I have run some benchmark to get the power, What I don’t understand is that is the result of BlackscholesGPU + CUDA memcpy D to H and H to D? I mean the power shown below or is it the sum of all the API calls made? In short, I want to know for what total time the program run and what was power?

==10883== Profiling result:
Time(%) Time Calls Avg Min Max Name
87.94% 336.21ms 512 656.66us 651.41us 663.28us BlackScholesGPU(float*, float*, float*, float*, float*, float, float, int)
7.02% 26.842ms 2 13.421ms 13.390ms 13.452ms [CUDA memcpy DtoH]
5.04% 19.282ms 3 6.4272ms 6.2572ms 6.7348ms [CUDA memcpy HtoD]

==10883== System profiling result:
Device “Tesla K40c (0)”
Count Avg Min Max
SM Clock (MHz) 71 651.55 324.00 666.00
Memory Clock (MHz) 71 2890.76 324.00 3004.00
Temperature (C) 141 38.79 37.00 40.00
Power (mW) 141 71300.77 20469.00 161870.00
Fan (%) 71 23.00 23.00 23.00
Device “GeForce 8800 GTS 512 (1)”
Count Avg Min Max
Temperature (C) 141 62.00 62.00 62.00
Fan (%) 71 37.00 37.00 37.00

==10883== API calls:
Time(%) Time Calls Avg Min Max Name
40.50% 328.12ms 2 164.06ms 321.18us 327.80ms cudaDeviceSynchronize
38.45% 311.48ms 5 62.297ms 211.16us 310.59ms cudaMalloc
13.01% 105.42ms 1 105.42ms 105.42ms 105.42ms cudaDeviceReset
6.11% 49.538ms 5 9.9077ms 6.4856ms 14.759ms cudaMemcpy
0.72% 5.8577ms 512 11.440us 10.967us 49.541us cudaLaunch
0.58% 4.7136ms 5 942.73us 905.14us 1.0302ms cudaGetDeviceProperties
0.25% 1.9876ms 168 11.831us 322ns 427.58us cuDeviceGetAttribute
0.17% 1.4023ms 4096 342ns 306ns 2.2600us cudaSetupArgument
0.09% 760.95us 5 152.19us 120.46us 269.24us cudaFree
0.03% 221.90us 2 110.95us 107.11us 114.79us cuDeviceTotalMem
0.03% 209.69us 512 409ns 381ns 2.8250us cudaConfigureCall
0.02% 200.25us 512 391ns 374ns 456ns cudaGetLastError
0.02% 191.88us 2 95.939us 78.852us 113.03us cuDeviceGetName
0.00% 8.6430us 1 8.6430us 8.6430us 8.6430us cudaSetDevice
0.00% 7.5760us 2 3.7880us 1.9640us 5.6120us cuDeviceGetPCIBusId
0.00% 6.6570us 11 605ns 314ns 2.0430us cuDeviceGet
0.00% 3.8210us 4 955ns 588ns 1.5870us cuDeviceGetCount
0.00% 3.3340us 2 1.6670us 363ns 2.9710us cudaGetDeviceCount

Robert_Crovella · June 3, 2015, 2:25pm

The GPU power consumption will usually vary during program execution. The power measurement is a sampled measurement:

[url]http://docs.nvidia.com/cuda/profiler-users-guide/index.html#system-profiling[/url]

If you add --print-gpu-trace to your profiler command (and perhaps drop the --print-api-trace command) you can get an idea of overall program execution flow and time.

Topic		Replies	Views
Analysis of CUDA Visual Profiler Output CUDA Programming and Performance	2	1880	October 6, 2008
nvprof CUDA Programming and Performance	1	1093	May 7, 2014
Time of API calls in nvprof's output is consumed in GPU or CPU Jetson TX2	2	585	October 18, 2021
nvprof and difference in time reported CUDA Programming and Performance	4	1130	September 16, 2017
power profiling with nvprof and averages CUDA Programming and Performance	1	2593	May 13, 2016
What do you understand by CPU time? CPU time, computational load, cuda prof CUDA Programming and Performance	8	2428	July 11, 2008
help me understanding the report of Profiler about reading the Profiler report CUDA Programming and Performance	1	1063	December 23, 2008
cuda visual profiler CUDA Programming and Performance	12	8202	July 30, 2008
CUDA Perfomance Profiling with Nvidia NSight in VS2010 - .nvreport report file CUDA Programming and Performance	1	834	April 30, 2013
How to explain the performance difference? CUDA Programming and Performance	7	3532	March 26, 2008

How to Understand output of nvprof?

Related topics