CUDA Fortran vs. OpenACC

Hi again, Mat:

Which is faster: CUDA Fortran or OpenACC, please? I’m guessing that it probably depends on the code.


Hi Erin,

For most cases, OpenACC and CUDA Fortran will be equivalent. But depending on the code, how much effort is spent, and the skill level of the programmer, moving to CUDA can gain some additional performance. OpenACC is meant to be performance portable so does make some optimization choices that are generic, while in CUDA Fortran you can hand tune for a particular hardware. CUDA Fortran also gives you the ability to use tensor cores and constant memory, which are not available with OpenACC.

Since they are inter-operable, I typically tell folks to start with OpenACC and then try adding CUDA Fortran to critical sections of code where you need the a bit more performance. This way you get the advantages of both, the ease of OpenACC for the bulk of the code, and then the hand tuning with CUDA Fortran for a few critical sections. Though, introducing CUDA will make your code less portable so you do need to determine if portability or being highly tuned are more important.