Difference of using "acc parallel loop" and "

I am confused about some semantics of OpenACC.

What is the difference of using “#pragam acc parallel loop”
and using “#pragma acc loop” ?

Also, by default, compiler would saw scalar valuable accessed within a loop.
But what if scalar valuable like A ?
Did the compiler make array A as private by default ?

Thanks.

Hi EvanzzzZ,

#pragma acc parallel loop” is just a short-cut which allows you to combine the two directives “#pragma acc parallel” and “”#pragma acc loop" into a single line.

In OpenACC, scalar variables are private by default while arrays are shared. The result of “A_” is a scalar but “A” itself isn’t (it’s a reference to an aggregate type). Since “A” is an array, it would be shared by default.

Hope this helps,
Mat_

Thanks.
Two more questions ,

I guess the “kernel” construct might generate one or more accelerator kernel for the code inside the block. And the “parallel” construct would launch one kernel for the code inside the block ? Is it correct ?

I guess OpenACC would translate the code into CUDA if the targeted
device is a Nvidia card, and translate into OpenCL if is a AMD card, is that correct ?

Thanks.

I guess the “kernel” construct might generate one or more accelerator kernel for the code inside the block. And the “parallel” construct would launch one kernel for the code inside the block ? Is it correct ?

Correct, but there’s more involved. Think of “kernels” as the compiler doing the analysis and determining the optimal way to perform the parallelism, while “parallel” is you telling the compiler how to perform the parallelism.

This article give a good explanation of the difference between “kernels” and “parallel” constructs. http://www.pgroup.com/lit/articles/insider/v4n2a1.htm

I guess OpenACC would translate the code into CUDA if the targeted
device is a Nvidia card, and translate into OpenCL if is a AMD card, is that correct ?

This is what we used to do but now target a low-level LLVM/SPIR IR. You can toggle back to the old behavior or using CUDA/OpenCL via the “-ta=tesla:nollvm” or “-ta=radeon:nollvm” flags.