Outputing the CUDA C the accelerator constructs?

My question is more of in the vein of “help me to continue to learn CUDA” than a “get this to work”. Namely, I was wondering if there was a pgfortran compiler option that one can use to get the compiler to output the CUDA code generated by the accelerator.

Following Dr Wolfe’s videos on this site, I constructed the Matrix Multiplication Driver/Kernel pairing that he demonstrates and tested them. On doing so, I used the -Minfo=all,accel -ta=nvidia options, and it does provide quite useful and interesting information about how the accelerator was working. I had just wondered if there was a further compiler option that might write out not just the $!acc do parallel, vector(16) calls, but the CUDA calls themselves, so I could see the shared memory, register, etc. transformations.

Hi TheMatt,

At this time we have not made this option available but are considering it. The problem is not so much exposing the generated CUDA code but the follow-up question of “can I then modify the generated CUDA code and have my application use the modified version?” is technically challenging since CUDA does not have a linker.

Thanks,
Mat

Mat,

Heh, I hadn’t even thought of that. Rather, I was just thinking of learning what certain Accelerator options do, etc, in terms of CUDA. If I wanted to take the next step as you state, it’d be more toward thinking about converting my Fortran code into pure CUDA C to see if I can squeeze more performance out. That way, I’d have a starting point.

And, as I said, learning “better” CUDA through your Accelerator logic which has more expert minds behind them.

Matt

Hi Matt,

We’ve decided that we’ll add a flag to the next release (9.0-3) that will allow the user to keep the intermediate CUDA code. It will just be the generated kernel, but will give you at least a starting point.

Thanks,
Mat

Mat, I can’t seem to find the option for this in 9.0-4, so could you post it? Now that I’m starting to look at/use CUDA Fortran and having to remap my brain, this could be useful to me.

Sure, it’s “-ta=nvidia,gpufile”, where a “.gpu” will be created containing the generated CUDA code.

Also, while normally the CUDA execuatable code will be embedded into your application’s binary, when “gpufile” is used, the CUDA binary is placed in a separate “.bin” file. The “.bin” file must be located in the same directory as the application in order to run.

  • Mat

Ooh, thanks. A quick try at this seems to indicate that Dr Wolfe, you, the team, are much more clever than I am at CUDA. What the compiler does is nothing like what I was thinking of doing in CUDA Fortran!