openACC vs. CUDA

PaulPa · January 25, 2013, 6:46pm

Hi JMa,

these are kind of high-level question, so let me try to give you some high-level answers as well.

This is an interesting question and there doesn’t seem to be just one answer to it. IMHO, I think that OpenACC will persist for at least the next few years (whether as OpenACC or as a part of OpenMP 4.0), however, it is likely that low-level programming models will still exist in the near future because they offer the programmer the possibility to highly tune it’s application (and that’s what HPC is about, right?).
On the other hand, OpenACC eases the way we program coprocessors. So I think that both approaches could benefit from each other. E.g.: Use OpenACC for the easy stuff and manually fine-tune some compute-intensive kernels with CUDA.

The only situation, that I can think of, which would make CUDA redundant would be if the compilers would become so powerful that they generate the same high-performance code that you could achieve with CUDA (or at least within some few %). I’m not a compiler engineer but it doubt that this is what we’ll see in the next few years.

So there are some limitations as of right now:

Function calls are not yet supported within parallel regions (unless they can be inlined)**
No nested parallelism is allowed**
…
Another limitation is that you as a programmer can not use CUDA intrinsic functions (e.g. warp functions) within your accelerator region. But this is the way it is suppossed to be for a directive-based approach - it should be easy to use and portable across different architectures (i.e. no intrinsics).

I hope that this answers some of your questions.

Best,
Paul

** These features will be implemented in OpenACC 2.0 (so there’s hope :))

Topic		Replies	Views
Getting Started with OpenACC Technical Blog	8	649	September 19, 2016
Check performance Legacy PGI Compilers	4	3316	September 28, 2017
OpenACC diff between GPU + CPU codes Legacy PGI Compilers	5	4105	May 31, 2012
Convincing skeptical bigwigs on the future of CUDA CUDA Programming and Performance	49	9169	March 19, 2009
Why my OpenACC code remains slower than OpenMP? Legacy PGI Compilers	3	3999	July 26, 2013
performance of PGI openacc directives Legacy PGI Compilers	9	5131	March 6, 2013
Unified binary for accelerators, serial? Legacy PGI Compilers	7	8471	November 6, 2013
OpenACC CUDA Programming and Performance	1	1257	December 22, 2011
OpenACC code much slower than CUDA on trivial copy/transpose Legacy PGI Compilers	1	2811	October 29, 2012
Translating FORTRAN to C++ to CUDA advice CUDA Programming and Performance	19	23470	February 1, 2010

openACC vs. CUDA

Related topics