Difference of using "acc parallel loop" and "

evanchong · July 29, 2015, 7:31pm

I am confused about some semantics of OpenACC.

What is the difference of using “#pragam acc parallel loop”
and using “#pragma acc loop” ?

Also, by default, compiler would saw scalar valuable accessed within a loop.
But what if scalar valuable like A ?
Did the compiler make array A as private by default ?

Thanks.

MatColgrove · July 29, 2015, 8:13pm

Hi EvanzzzZ,

“#pragma acc parallel loop” is just a short-cut which allows you to combine the two directives “#pragma acc parallel” and “”#pragma acc loop" into a single line.

In OpenACC, scalar variables are private by default while arrays are shared. The result of “A_” is a scalar but “A” itself isn’t (it’s a reference to an aggregate type). Since “A” is an array, it would be shared by default.

Hope this helps,
Mat_

evanchong · July 29, 2015, 9:03pm

Thanks.
Two more questions ,

I guess the “kernel” construct might generate one or more accelerator kernel for the code inside the block. And the “parallel” construct would launch one kernel for the code inside the block ? Is it correct ?

I guess OpenACC would translate the code into CUDA if the targeted
device is a Nvidia card, and translate into OpenCL if is a AMD card, is that correct ?

Thanks.

MatColgrove · July 29, 2015, 10:03pm

I guess the “kernel” construct might generate one or more accelerator kernel for the code inside the block. And the “parallel” construct would launch one kernel for the code inside the block ? Is it correct ?

Correct, but there’s more involved. Think of “kernels” as the compiler doing the analysis and determining the optimal way to perform the parallelism, while “parallel” is you telling the compiler how to perform the parallelism.

This article give a good explanation of the difference between “kernels” and “parallel” constructs. Account Login | PGI

I guess OpenACC would translate the code into CUDA if the targeted
device is a Nvidia card, and translate into OpenCL if is a AMD card, is that correct ?

This is what we used to do but now target a low-level LLVM/SPIR IR. You can toggle back to the old behavior or using CUDA/OpenCL via the “-ta=tesla:nollvm” or “-ta=radeon:nollvm” flags.

Topic		Replies	Views
#pragma acc kernels loop Versus #pragma acc parallel loop Legacy PGI Compilers	3	10710	June 1, 2015
default(none) directive behaviour as per OpenMP? Legacy PGI Compilers	3	3023	April 30, 2019
CUDA shared memory in OpenACC Legacy PGI Compilers	5	9174	April 13, 2016
OpenACC diff between GPU + CPU codes Legacy PGI Compilers	5	4037	May 31, 2012
scalars, parallel construct and kernel construct Legacy PGI Compilers	1	1693	February 21, 2013
private OpenACC clause on loop, kernels, and parallel constr Legacy PGI Compilers	6	10827	February 1, 2013
Questions about "parallel" and "loop" Legacy PGI Compilers	1	2631	August 5, 2015
OpenACC and nested loops Legacy PGI Compilers	2	4029	September 19, 2014
($acc parallel loop) VS ( $acc kernels loop ) ? Legacy PGI Compilers	1	2129	January 11, 2013
OpenACC routine behavior nvfortran nvc, nvc++ and nvfortran	4	27	April 11, 2025

Difference of using "acc parallel loop" and "

Related topics