# function inline with pointers

Hi all,

`````` do i=1,n_particle
call obtain_DEM_kinematic(i,Pos,Dis,Vel,Acc, UnitE, UnitE1,UnitE2,UnitE3,RVel, RAcc )
ELE(1,i)=Pos(1)
ELE(2,i)=Pos(2)
ELE(3,i)=Pos(3)
XYZ(1,i)=ELE(1,i)-ELE(4,i)
XYZ(2,i)=ELE(2,i)-ELE(4,i)
XYZ(3,i)=ELE(3,i)-ELE(4,i)
XYZ(4,i)=ELE(1,i)+ELE(4,i)
XYZ(5,i)=ELE(2,i)+ELE(4,i)
XYZ(6,i)=ELE(3,i)+ELE(4,i)
end do
``````

This is part of my code,I want to use openacc to accerlator it,but the subroutine here contains pointers,and I have tried many methods to inline it,but failed.Doyou have any advice?Thank you!
The following is the subroutine:

`````` subroutine obtain_DEM_kinematic(EDID,Pos,Dis,Vel,Acc, UnitE, UnitE1,UnitE2,UnitE3,RVel, RAcc)
implicit none
integer, intent(in) :: EDID
real( kind = RK ), optional :: Pos(3)
real( kind = RK ), optional :: Dis(3)
real( kind = RK ), optional :: Vel(3)
real( kind = RK ), optional :: Acc(3)
real( kind = RK ), optional :: UnitE(3),UnitE1(3),UnitE2(3),UnitE3(3)
real( kind = RK ), optional :: RVel(3)
real( kind = RK ), optional :: RAcc(3)
integer :: total_dem
character(len = MNL) :: msg, msg1, cdid

total_dem = get_total_number_dem()

if( EDID > total_dem .or. EDID <= 0 ) goto 555

if( present(Pos) ) Pos = dem(edid) % kinematic % POS
if( present(Dis) ) Dis = dem(edid) % kinematic % Dis
if( present(Vel) ) Vel = dem(edid) % kinematic % Vel
if( present(Acc) ) Acc = dem(edid) % kinematic % Acc
if( present(UnitE)) UnitE = dem(edid) % kinematic % UnitE
if( present(UnitE1)) UnitE1 = dem(edid) % kinematic % UnitE1
if( present(UnitE2)) UnitE2 = dem(edid) % kinematic % UnitE2
if( present(UnitE3)) UnitE3 = dem(edid) % kinematic % UnitE3
if( present(RVel) ) RVel = dem(edid) % kinematic % RVel
if( present(RAcc) ) RAcc = dem(edid) % kinematic % RAcc
! write(*,*)acc
!pause

return
555 continue
write(cdid, *) edid;   cdid = adjustl( cdid )
msg = 'Subprogram Exception. Discrete element ID is greater than the total number of DEM '
msg1 = 'Discrete Element ID: '//trim(cdid)
call ErrorMSG( msg, msg1 )
return
end subroutine obtain_DEM_kinematic

implicit none
integer, intent(in) :: eid
real( kind = RK ) :: radius
integer :: total_dem
character(len = MNL) :: msg, msg1, msg2, cdid

total_dem = get_total_number_dem()
if( eid > total_dem .or. eid <= 0 ) goto 555

return
555 continue
write(cdid, *) eid;   cdid = adjustl( cdid )
msg = 'Subprogram Exception. In GET_DEM_RADIUS'
msg1 = 'Discrete element ID is greater than the total number of DEM '
msg2 = 'Discrete Element ID: '//trim(cdid)
call ErrorMSG( msg, msg1 )
return
``````
• bigwbxu

Hi bigwbxu,

What errors are you getting? Can you write-up a reproducing example?

The Write statements and call to “ErrorMSG” in obtain_DEM_kinematic are problematic. Try conditionally compiling them:

``````  555 continue
#if !defined(_OPENACC)
write(cdid, *) edid;   cdid = adjustl( cdid )
msg = 'Subprogram Exception. Discrete element ID is greater than the total number of DEM '
msg1 = 'Discrete Element ID: '//trim(cdid)
call ErrorMSG( msg, msg1 )
#endif
return
``````

Of course, you’ll not be catching your error condition if you do add this.

Also the use of the derived type may or may not be a problem (ex. “dem(edid)%kinematic%POS”) If the types are fixed size, then it’s ok. If any other members in these types are dynamically allocated, then you’ll need to wait until the OpenACC standards committee have defined how perform deep copies.

• Mat

Hi Mat,

Then,the call to “ErrorMSG” is not important,it can be ingored.

Thirdly, I have checked my code and I find the derived type parameter is dynamically allocated.So I decided to change my code.

Lastly,I have write a part of codes and added openacc directive,but it cannot compile successfully,Could you suggest a workaround?

``````300 !\$acc data present_or_copy(A,H,ELE),copyout(AA,HH)
301 !\$acc region
302 !\$acc do private(k,i,j)
303  do k=1,n_box
!\$acc do private(ni), independent
305     do i=1,A(k)-1
ni=H(i,k)
!\$acc do private(nj),independent
308 	   do j=i+1,A(k)
nj=H(j,k)
vector=0.0
distance=0.0
distant=0.0
if(ni>0.and.ni<=n_particle)then
if(nj>0.and.nj<=n_particle)then
!\$acc do private(g,distance)
317 		     do g=1,3
vector=ELE(g,ni)-ELE(g,nj)
distance=distance+vector*vector
end do
distant=sqrt(distance)
if(distant<=(ELE(4,ni)+ELE(4,nj))*1.001)then
if(ni/=nj)then
if( ni < nj ) then
AA(ni)=AA(ni)+1
HH(AA(ni),ni)=nj
!\$acc do private(v),independent
331 						        do v=1,AA(ni)-1
if(HH(v,ni)==nj)then
HH(AA(ni),ni)=0
AA(ni)=AA(ni)-1
end if
end do
else
AA(nj)=AA(nj)+1
HH(AA(nj),nj)=ni
!\$acc do private(v),independent
341 						        do v=1,AA(nj)-1
if(HH(v,nj)==ni)then
HH(AA(nj),nj)=0
AA(nj)=AA(nj)-1
end if
end do
end if
end if
end if
end if
end if
end do
end do
end do
!\$acc end region
!\$acc end data
``````

The acclerator infprmation is
300,Generating copyout(hh(:,:))
Generating copyout(aa(:))
Generating present_or_copy(ele(:,:))
Generating present_or_copy(h(:,:))
Generating present_or_copy(a(:))
301, Generating present_or_copy(a(:))
Generating present_or_copyout(aa(:))
Generating present_or_copyout(hh(:,:))
Generating present_or_copy(ele(:,:))
Generating present_or_copy(h(:,:))
Generating NVIDIA code
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
Generating compute capability 3.0 binary
303, Loop carried dependence due to exposed use of ‘aa(:)’ prevents parallelization
Loop carried dependence due to exposed use of ‘hh(:,:)’ prevents parallelization
Accelerator kernel generated
317, !\$acc loop vector(128) ! threadidx%x
331, !\$acc loop vector(128) ! threadidx%x
341, !\$acc loop vector(128) ! threadidx%x
305, Loop is parallelizable
308, Loop is parallelizable
317, Loop is parallelizable
331, Loop is parallelizable
341, Loop is parallelizable

-bigwbxu

You just need to add “independent” to the “do” directive at line 302. You have it in the other spots, just not the outer most loop.

You need use independent because the indices used in AA and HH and taken from a look-up array, so it has no way of knowing if all indices are discreet. Adding “independent” asserts to the compiler that they are.

• Mat

Hi Mat,

I compilied the program again with “independent” added to the outer most loop .It can be compilied successfully.

But I want to get the values of array AA and HH.Once I use openacc directives ,the values are disorder and incorrect.

And, do you konw when will it show “call to cuStreamSynchronize returned error 702:Launch timeout”,I just use openacc,not cuda,and I don’t set anything about the stream.

-bigwbxu