For a gromacs job, I use gmx mdrun -v -nt 14 -nb gpu -deffnm nvt and that will launch 16 threads on the node. So, top command shows one gmx process with 1400 cpu utilization. Also, nvidia-smi shows one gmx process.
Now I want to feed that to nvprof. If I use mpirun nvprof --profile-child-processes --metrics achieved_occupancy gmx mdrun -v -nt 14 -nb gpu -deffnm nvt I see 8 processes in top and nvidia-smi. The commulative memory usage of these 8 processes is about the same as previous run (without nvprof). I have to say the total memory usage is now larger than previous run due to the profiling overhead and that is fine. I just want to point out that here, each process memory usage is not the same as previous run.
Another thing to consider is that, in the first run, I see one “step 200:” while in the second run, I see 8 “step 200:”. That is not what I want.
Any idea about that?