Are there any options to generate the CUDA kernel code? I think it’s really helpful for us to optimize the code.
Can you please clarify your question?
We have two different ways that users can generate code targeting NVIDIA GPUs.
First is the PGI Accelerator model, where users insert directives around the code they wish to off-load to the GPU and the compiler generates the kernel. Full details can be found at: http://www.pgroup.com/resources/accel.htm.
Second is CUDA Fortran. This is an explicit extension to Fortran where users write there own kernels. Please see: http://www.pgroup.com/resources/cudafortran.htm for details.
Hi, thanks for your quick response. I’m using the PGI accelerator model to parallelize some codes, e.g., 3-D stencil. However, the performance is quite low compared to my CUDA hand code. I think it’s because I didn’t use the directives properly. So I wonder if pgcc supports option for user to see the kernel code it generates. Thanks.
Yes. Use the “-ta=nvidia,keepgpu” flag to have the compiler keep the intermediary CUDA C GPU file. The caveat is that the file is not very human readable. You can also keep the generated PTX file using “keepptx”.
Great, I can generate the CUDA code now. Thank you for your help.