I have noticed that with profiler, I see different results in the output of the program. With 17 metrics, a 1 and a half minute execution takes about 3 and half hour which implies how profiling affects the runtime.
Without nvprof, the output look like
Performance: 11.493 ns/day, 2.088 hours/ns, 26.605 timesteps/s
74.1% CPU use with 2 MPI tasks x no OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 57.888 | 58.743 | 59.597 | 11.2 | 78.14
Neigh | 0.49963 | 0.50408 | 0.50853 | 0.6 | 0.67
Comm | 4.4834 | 5.0893 | 5.6951 | 26.9 | 6.77
Output | 0.00098204 | 0.0014943 | 0.0020066 | 1.3 | 0.00
Modify | 7.6897 | 7.7813 | 7.8729 | 3.3 | 10.35
Other | | 3.056 | | | 4.07
Nlocal: 250000 ave 250176 max 249824 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost: 44831.5 ave 44833 max 44830 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs: 0 ave 0 max 0 min
Histogram: 2 0 0 0 0 0 0 0 0 0
Total # of neighbors = 0
Ave neighs/atom = 0
Neighbor list builds = 399
Dangerous builds = 397
However, with nvprof, I see
Performance: 0.072 ns/day, 335.034 hours/ns, 0.166 timesteps/s
84.9% CPU use with 2 MPI tasks x no OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 12024 | 12029 | 12034 | 4.6 | 99.73
Neigh | 0.4934 | 0.50414 | 0.51487 | 1.5 | 0.00
Comm | 15.88 | 20.859 | 25.838 | 109.0 | 0.17
Output | 0.00096084 | 0.0023741 | 0.0037874 | 2.9 | 0.00
Modify | 7.5295 | 7.5312 | 7.5329 | 0.1 | 0.06
Other | | 3.245 | | | 0.03
Nlocal: 250000 ave 250176 max 249824 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost: 44831.5 ave 44833 max 44830 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs: 0 ave 0 max 0 min
Histogram: 2 0 0 0 0 0 0 0 0 0
Total # of neighbors = 0
Ave neighs/atom = 0
Neighbor list builds = 399
Dangerous builds = 397
Is there any statement for judgement?