Compiling OpenMP & OpenACC for simultaneous execution

Suppose that I have two regions of code (See below).
I would like to create one program that executes the OpenMP region on the CPU and the OpenACC region on the GPU simultaneously.
Can I do that? and how?

Thanks in advance.
Ami

#pragma omp parallel for
for (int i = 1; i < m; i++)
{
some work… }

#pragma acc kernels
for (int j = 1; j < n; j++)
{
some work… }

Hi Ami,

There’s several ways this could be done. In my opinion, the easiest would be to launch your OpenACC compute region asynchronous, enter the OpenMP region, then use an OpenACC “wait” pragma to sync. Something like:

#pragma acc data pcopy(myarr[0:size])
{

// Use "async" to have the host code not wait for the
// kernel to finish before continuing
// Make sure no data is copied back, including a reduction,
// otherwise the code will block on the data movement
#pragma acc kernels present(myarr) async
for (int j = 1; j < n; j++) 
 { 
 some work... 
 }

// CPU continues and then enters the OpenMP region
#pragma omp parallel for 
 for (int i = 1; i < m; i++) 
 { 
 some work... } 

} // end the data region, copy back myarr
// sync the host and device execution
#pragma acc wait
  • Mat

Mat,

How such a program have to be compiled?
with the compiler option -acc or -mp?

Ami

compiler option -acc or -mp?

Both.