Different program output when using nvprof

I have noticed that with profiler, I see different results in the output of the program. With 17 metrics, a 1 and a half minute execution takes about 3 and half hour which implies how profiling affects the runtime.

Without nvprof, the output look like

Performance: 11.493 ns/day, 2.088 hours/ns, 26.605 timesteps/s
74.1% CPU use with 2 MPI tasks x no OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 57.888     | 58.743     | 59.597     |  11.2 | 78.14
Neigh   | 0.49963    | 0.50408    | 0.50853    |   0.6 |  0.67
Comm    | 4.4834     | 5.0893     | 5.6951     |  26.9 |  6.77
Output  | 0.00098204 | 0.0014943  | 0.0020066  |   1.3 |  0.00
Modify  | 7.6897     | 7.7813     | 7.8729     |   3.3 | 10.35
Other   |            | 3.056      |            |       |  4.07

Nlocal:    250000 ave 250176 max 249824 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost:    44831.5 ave 44833 max 44830 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs:    0 ave 0 max 0 min
Histogram: 2 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0
Ave neighs/atom = 0
Neighbor list builds = 399
Dangerous builds = 397

However, with nvprof, I see

Performance: 0.072 ns/day, 335.034 hours/ns, 0.166 timesteps/s
84.9% CPU use with 2 MPI tasks x no OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 12024      | 12029      | 12034      |   4.6 | 99.73
Neigh   | 0.4934     | 0.50414    | 0.51487    |   1.5 |  0.00
Comm    | 15.88      | 20.859     | 25.838     | 109.0 |  0.17
Output  | 0.00096084 | 0.0023741  | 0.0037874  |   2.9 |  0.00
Modify  | 7.5295     | 7.5312     | 7.5329     |   0.1 |  0.06
Other   |            | 3.245      |            |       |  0.03

Nlocal:    250000 ave 250176 max 249824 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost:    44831.5 ave 44833 max 44830 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs:    0 ave 0 max 0 min
Histogram: 2 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0
Ave neighs/atom = 0
Neighbor list builds = 399
Dangerous builds = 397

Is there any statement for judgement?

Yes, the profiler affects kernel execution behavior. If you are doing host-based timing, this may affect reported timing results. The solution is to not be concerned about host-based timing when you are using the profiler.

I don’t see any difference in the computed “results” of your application. The profiler should not affect correctness of computed results, but may impact overall timing.