My question is more of in the vein of “help me to continue to learn CUDA” than a “get this to work”. Namely, I was wondering if there was a pgfortran compiler option that one can use to get the compiler to output the CUDA code generated by the accelerator.
Following Dr Wolfe’s videos on this site, I constructed the Matrix Multiplication Driver/Kernel pairing that he demonstrates and tested them. On doing so, I used the -Minfo=all,accel -ta=nvidia options, and it does provide quite useful and interesting information about how the accelerator was working. I had just wondered if there was a further compiler option that might write out not just the $!acc do parallel, vector(16) calls, but the CUDA calls themselves, so I could see the shared memory, register, etc. transformations.