OpenACC show compiler optimizations


I implemented a matrix-vector multiplication with ELLPACK-R format for a CG-algorithm with OpenACC and CUDA.
Although I used the same algorithm, the OpenACC implementation is much faster.
Is there a way to see what the OpenACC compiler does, if it uses shared memory, other optimizations, etc.?


Hi Fabian,

The flag “-Minfo=accel” will give you quite a bit of information about how the compiler is accelerating the code. If you want to see the generated intermediary code you can use the “keep” sub-option to the “-ta” flag. For example “-ta=tesla:keep”. The resulting “.gpu” file will contain the generated code.

Hope this helps,

Hi Mat,

The information given by “-Minfo=accel” are very general and don’t realy help.
I cannot read much out of the gpu files, but I saw in the ptx file that the kernel uses shifts, what I don’t understand.
Do you know anything else, I could try?

Thank you for your help!

Hi Fabian,

No, that’s pretty much it.

  • Mat