Hello Mat,
Hope you are doing well…
I have a problem where I wish to run two GPUs using OpenACC in Fortran while using OpenMP. Here is my code:
Program TEST
use openacc
use omp_lib
IMPLICIT NONE
INTEGER :: ngpus
integer, value :: devicenum
integer(acc_device_kind), value :: devicetype
integer(acc_device_property):: property
character*(1000) :: string
REAL*8, Allocatable :: A(:,:),B(:,:),C(:,:)
INTEGER :: i,j,n
devicetype = acc_get_device_type()
ngpus = acc_get_num_devices(devicetype)
Write(*,*) 'Number of ACC devices are: ', ngpus
do devicenum=1,ngpus
property=acc_property_driver
call acc_get_property_string(devicenum,devicetype,property,string)
Write(*,*) trim(string)
property=acc_property_name
call acc_get_property_string(devicenum,devicetype,property,string)
Write(*,*) trim(string)
enddo
N=1000
Allocate(A(n,n),B(n,n),C(n,n))
A=0
B=0
C=0
call omp_set_num_threads(2)
!$omp parallel private(i,j)
!$omp sections
!$omp section
call acc_set_device_num(1,acc_device_nvidia)
!$acc data copyout(A)
!$acc parallel loop async
Do i = 1 , n
Do j = 1 , n
A(i,j)=(2*i+3*j)/6
end do
end do
!$acc wait
!$acc end data
!$omp section
call acc_set_device_num(2,acc_device_nvidia)
!$acc data copyout(B)
!$acc parallel loop async
Do i = 1 , n
Do j = 1 , n
B(i,j)=(2*i+3*j)/12
end do
end do
!$acc wait
!$acc end data
!$omp end sections
!$omp end parallel
call acc_set_device_num(1,acc_device_nvidia)
!$acc data copyin(A,B) copyout(C)
!$acc parallel loop
Do i = 1 , n
Do j = 1 , n
C(i,j)= A(i,j) - B(i,j)
end do
end do
!$acc wait
!$acc end data
WRITE(*,*) C(50,50)
End Program
And I am compiling it with:
! set ACC_DEVICE_TYPE=nvidia
! % pgf90 -mp -ta=tesla:cc35,cc50 test2.f90 -Minfo=all
There problem is, it appears that I have no idea how to use OpenMP in Fortran. I am using the omp parallel and sections, however, what I get is that the parts where I wish for array [A] to be solved on GPU1 and array to be solved on GPU2 …end up all being solved on GPU2…
I have checked that is no parallelism through PGPROF:
I wish to calculate both [A] and at the same time, each on a different GPU. Kindly give me an idea where I went wrong…
Thank you.
Ahmed