I am trying to add omp stuff to use gpus with my code, but so far it’s unsuccessful.
Here a small example of what I tried to do:
!$omp target teams distribute parallel do map(from:sxzi) map(from:d1)
do j=2,this%npart
do i=1,j-1
ij=this%pair(i,j)
do is=1,3
sxzold=this%sxz(:,:,:,idet)
opi=sx15(:,3+is,:,i)
di=opi(1,:)*this%sp(1,i)+opi(2,:)*this%sp(2,i)+opi(3,:)*this%sp(3,i)+opi(4,:)this%sp(4,i)
d1=di(i)
detinv=cone/d1
di=didetinv
do m=1,this%npart
sxzi(:,:,m)=sxzold(:,:,m)-di(m)*sxzold(:,:,i)
sxzi(:,i,m)=opi(:,m)-di(m)*opi(:,i)
enddo
sxzi(:,:,i)=sxzold(:,:,i)*detinv
sxzi(:,i,i)=opi(:,i)*detinv
enddo
enddo
enddo
where this%npart is a small number (let’s say around 10). If I add the omp part, the execution time becomes much longer. Is that because the loops are not very big? What can I try to improve this? Is it the right way to specify the options to omp?