Suppose that I have two regions of code (See below).
I would like to create one program that executes the OpenMP region on the CPU and the OpenACC region on the GPU simultaneously.
Can I do that? and how?
Thanks in advance.
Ami
#pragma omp parallel for
for (int i = 1; i < m; i++)
{
some work… }
#pragma acc kernels
for (int j = 1; j < n; j++)
{
some work… }
There’s several ways this could be done. In my opinion, the easiest would be to launch your OpenACC compute region asynchronous, enter the OpenMP region, then use an OpenACC “wait” pragma to sync. Something like:
#pragma acc data pcopy(myarr[0:size])
{
// Use "async" to have the host code not wait for the
// kernel to finish before continuing
// Make sure no data is copied back, including a reduction,
// otherwise the code will block on the data movement
#pragma acc kernels present(myarr) async
for (int j = 1; j < n; j++)
{
some work...
}
// CPU continues and then enters the OpenMP region
#pragma omp parallel for
for (int i = 1; i < m; i++)
{
some work... }
} // end the data region, copy back myarr
// sync the host and device execution
#pragma acc wait