I guess the “kernel” construct might generate one or more accelerator kernel for the code inside the block. And the “parallel” construct would launch one kernel for the code inside the block ? Is it correct ?
Correct, but there’s more involved. Think of “kernels” as the compiler doing the analysis and determining the optimal way to perform the parallelism, while “parallel” is you telling the compiler how to perform the parallelism.
This article give a good explanation of the difference between “kernels” and “parallel” constructs. http://www.pgroup.com/lit/articles/insider/v4n2a1.htm
I guess OpenACC would translate the code into CUDA if the targeted
device is a Nvidia card, and translate into OpenCL if is a AMD card, is that correct ?
This is what we used to do but now target a low-level LLVM/SPIR IR. You can toggle back to the old behavior or using CUDA/OpenCL via the “-ta=tesla:nollvm” or “-ta=radeon:nollvm” flags.