Understanding how my code is accelerated


I simple code to measure the similarity of pixels and the 8 neighbors, and have used several arrays to store the result of each neighbor.

so, my supervisor asked me to find out how the compiler is exploiting SIMD to execute the code on the GPU, especially how the memory accesses are occurring to store the results inside the 8 arrays.

is there anyway to do that ?

thanks in advance


When compiling codes for performance, we have a switch called
-Minfo, which to get the most information


and it will inform when optimizations or changes in the code
flow or organization has changes. Try some programs with loops,
and compile with
-fast -Minfo=all

to get some ideas about what the compiler is doing.

But you are asking about the GPU, and PGI does not have many
switches to inform you of GPU optimizations.

The greatest performance gains on the GPUs are from calling well designed CUDA routines to perform the operations.

Since GPUs are becoming the standard for compute performance in compute centers, understanding how to program CUDA routines would give greater insight as to how it all works.

A very good book on this is “CUDA by Example” which goes through the logic and mathematics of CUDA, how you turn a compute intensive loop into a series of CUDA calls that run tremendously faster (when you have a tremendous amount of computation to do , this is good - when not, it can be underwhelming).

OpenACC is an easier path, where you add directives to already
working code (on your CPU) and the compiler takes care of generating GPU code and moving data to and from the GPU.

is a quick tutorial about moving code from a CPU to


Thank you Dave.
you’ve been most helpful.