About compiler parallelization strategy/info


Yes there a way I could access details about the loop parallelized by the compiler. For instance, the instruction count in a given parallel loop/section?

It may be ask tool much and is possible not available, but it would be useful as well to undertand the expected performance of a section in general. Does the compiler create models for that during the compilation, what is reasonable in order to decided among parallelization strategies. Is any of this information available?


Hi George,

Are you asking about CPU code instructions or GPU?

For the CPU, you can use the PGI utility pgcollect to perform sample based profiling and then use the PGI Profiler pgprof to drill down into the the assembly. You can also instrument your code using the flag “-Mprof=lines”. It’s a slower to run and is at the line level rather than assembly, but is more accurate than sample based profiling. Other usefully 3rd party profilers are Oprofile and TAU.

For the GPU, we provide basic profiling information (data movement and kernel time). for more in depth profiling, you can use NVIDIA’s CUDA Profiler. It doesn’t give instruction counts, but does give a lot of useful information.

For complete detail about PGI’s profiling tools please see the PGPROF Users Guide.

Hope this helps,