I’m trying to profile some code to see why it doesn’t scale well with OpenMP, and I can’t figure out what PGPROF is telling me.
Compiled without -mp but with -Mprof=lines, the three most time-consuming parts of my code are “_roupri”, “__linent2”, and “_rouret”. What do those mean, and given that they take up ~50% of my serial code’s runtime is there anything I can do to shorten that? Also, if I try double-clicking on the entries in PGPROF I get taken to assembler code rather than a specific line(s) in my Fortran code; how can I see which lines of my code are the source of the time sinks? As a bonus question, is “_mp_get_tcpus” related to OpenMP, and why would it be called regularly from a serial code?
Compiled with -mp and -Mprof=lines, the three most demanding entries in PGPROF are “__linent2”, “_roupri”, and “mp_ecs”. I’ve already asked about the first two, but what’s that last one?
If I forget about -Mprof=lines and just compile with -mp and -Minfo=ccff, the two most time-consuming entries (accounting for 74% of the CPU time!!) are “mp_barrier” and “mp_barrierw”. I’d love to know which critical section or atomic addition is the root of this delay so I could program around it, but once again double clicking on either of these takes me to assembler code. How can I determine which part of the OpenMP code is causing the delays?
Many thanks in advance.