When I use pgprof to profile my program, I got an error as follows:
#:~/CFL3D$ pgprof ./cfl3d_seq <HSCM3_fine.inp
==32521== PGPROF is profiling process 32521, command: ./cfl3d_seq
==32521== Profiling application: ./cfl3d_seq
==32521== Profiling result:
No kernels were profiled.
No API activities were profiled.
==32521== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
======== Error: Application received signal 139
so, I used nvprof instead. Here is the profile result. I want to know the total running time on GPU is the time:(API calls + GPU activities)? Or “API calls” include “GPU activities”? And how to reduce the time of “cudaFree” and “cudaMalloc” in “API calls”?
==26402== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 54.09% 20.4365s 52224 391.32us 832ns 19.270ms [CUDA memcpy HtoD]
23.74% 8.96840s 4032 2.2243ms 50.080us 18.944ms [CUDA memcpy DtoH]
5.65% 2.13310s 192 11.110ms 8.5115ms 14.219ms twokernel_do2335_2_
5.58% 2.10966s 192 10.988ms 8.4016ms 14.030ms twokernel_do2333_2_
5.36% 2.02546s 192 10.549ms 8.0914ms 13.581ms twokernel_do2334_2_
0.62% 233.64ms 192 1.2169ms 929.29us 1.6044ms twokernel_do893_
... ...
0.01% 4.1183ms 192 21.449us 20.864us 22.113us diagjkernel_do7016_
API calls: 72.09% 35.0658s 56256 623.32us 8.6680us 20.016ms cudaMemcpy
22.30% 10.8480s 52224 207.72us 4.8030us 14.229ms cudaFree
5.07% 2.46781s 52224 47.254us 5.4810us 517.61ms cudaMalloc
0.53% 257.91ms 10560 24.423us 8.0070us 568.38us cudaLaunchKernel
0.00% 805.97us 1 805.97us 805.97us 805.97us cuDeviceTotalMem
0.00% 547.29us 96 5.7000us 256ns 214.70us cuDeviceGetAttribute
0.00% 67.739us 1 67.739us 67.739us 67.739us cuDeviceGetName
0.00% 7.1660us 1 7.1660us 7.1660us 7.1660us cuDeviceGetPCIBusId
0.00% 3.4950us 3 1.1650us 345ns 1.9350us cuDeviceGetCount
0.00% 2.4870us 1 2.4870us 2.4870us 2.4870us cuDriverGetVersion
0.00% 1.5180us 2 759ns 350ns 1.1680us cuDeviceGet
0.00% 529ns 1 529ns 529ns 529ns cuDeviceGetUuid