Performance Portability from GPUs to CPUs with OpenACC

Originally published at: https://developer.nvidia.com/blog/performance-portability-gpus-cpus-openacc/

OpenACC gives scientists and researchers a simple and powerful way to accelerate scientific computing applications incrementally. The OpenACC API describes a collection of compiler directives to specify loops and regions of code in standard C, C++, and Fortran to be offloaded from a host CPU to an attached accelerator. OpenACC is designed for portability across operating…

What is the cost for this compiler?

Academic can get a free PGI compiler license with the NVIDIA OpenACC Toolkit download. Non-academic users can get a free trial and full PGI pricing information is available here: http://www.pgroup.com/prici...

Does this compiler support vectorization or only multi-threading?

Vector clauses do not affect code generation at this point, but the compiler will vectorize and generate SIMD loops automatically where it can. In other words, the existing PGI auto-vectorizer is not disabled when OpenACC for multicore is used. There are also some vectorization improvements in the works for future releases.