PGI complier with OMP option

Hi All,

I am trying to accelerate my computation, here is what I have: GPU card (Quardo FX4800) installed on the 8cores CPU with 16G memory workstation.

Here is my question, if I have a loop such as:

do iy=1, NY
   do ix=1, NX
      do iz=1, NZ
           pxx = p(iz,ix-1,iy) + p(iz, ix+1, iy)
           pyy = p(iz,ix,iy-1) + p(iz, ix, iy+1)
           pzz = p(iz-1,ix,iy) + p(iz+1, ix, iy)
           der  = pxx + pyy + pzz

how can I use !$acc region / !$acc end region

and !$OMP PARALLEL DO PRIVATE(iy,ix,iz, pxx, pyy,pzz) Schedule (dynamic)

together? I try to combine those two tips togehter to get better computational time.

Is it [possible to us egpi complier to do that?


Is it my question clear?

I just want to see how to use pgi complier to combine GPU and OMP together. In this case, we do not need to waste any resources.


Hi fishwater00,

Is it my question clear?

Not quite, so if my answer is unclear please let me know.

Assuming that you’re planning on saving the results of “der” (such as into another 3-D array, not ‘p’), then you should be able to just put the accelerator directives before and after the “iy” loop and the compiler will accelerate it. For performance on a GPU, you really want to have a lot of threads, 10’s of thousands of threads. So for this loop, I would just use the accelerator model.

OpenMP can be combined with the PGI Accelerator model. However, at this time, it’s not as easy as adding both directives. Instead, you need to first assign each thread to a GPU before entering an accelerator region and manually distribute the work to each thread.

The basic outline would be something like:

!$omp parallel private(ilo,ihi,i) num_threads(2)
  call acc_set_device(omp_get_thread_num())
  ilo = omp_get_thread_num()*(N+1)/2 + 1
  ihi = min(N,ilo+(N+1)/2)
  !$acc region do
  do i = ilo,ihi
    a(i) = b(i) + c(i)
!$omp end region

Hope this helps,

Thank you. That is what I want. It is very useful.