Time of API calls in nvprof's output is consumed in GPU or CPU

I am using nvprof to track the time of all my API calling. For a specific call, I am interesting in whether time is consumed in CPU or GPU. Suppose, I use zero copy to allocate memory in CPU, so the 152ms consumed by cudaHostAlloc is the time that consumed in CPU? Is it the system time or user time or other time?


==6266== API calls:
Time(%) Time Calls Avg Min Max Name
98.93% 156.52ms 3 52.173ms 127.83us 156.26ms cudaHostAlloc
0.49% 770.33us 1 770.33us 770.33us 770.33us cudaConfigureCall
0.20% 311.33us 1 311.33us 311.33us 311.33us cudaDeviceSynchronize
0.14% 228.34us 83 2.7510us 750ns 84.416us cuDeviceGetAttribute
0.11% 171.83us 1 171.83us 171.83us 171.83us cudaLaunch
0.10% 163.00us 1 163.00us 163.00us 163.00us cudaGetDeviceProperties
0.01% 13.667us 3 4.5550us 2.8330us 7.7500us cudaFree
0.01% 12.000us 1 12.000us 12.000us 12.000us cudaSetDeviceFlags
0.01% 9.9990us 3 3.3330us 2.5000us 4.8330us cudaHostGetDevicePointer
0.00% 6.3330us 4 1.5830us 917ns 2.5830us cudaSetupArgument
0.00% 6.0840us 2 3.0420us 1.3340us 4.7500us cuDeviceGetCount
0.00% 2.9170us 1 2.9170us 2.9170us 2.9170us cuDeviceTotalMem
0.00% 2.1670us 2 1.0830us 1.0830us 1.0840us cuDeviceGet
0.00% 1.8330us 1 1.8330us 1.8330us 1.8330us cuDeviceGetName


It is device execution time.

It’s recommended to open the profiling data with NVVP, which can show you detail CPU and GPU execution time.