function inline with pointers

Hi all,

 do i=1,n_particle
	      call obtain_DEM_kinematic(i,Pos,Dis,Vel,Acc, UnitE, UnitE1,UnitE2,UnitE3,RVel, RAcc ) 
		  ELE(1,i)=Pos(1)
		  ELE(2,i)=Pos(2)
		  ELE(3,i)=Pos(3)
		  ELE(4,i)=get_dem_radius(i)
		  XYZ(1,i)=ELE(1,i)-ELE(4,i)      
		  XYZ(2,i)=ELE(2,i)-ELE(4,i)        
		  XYZ(3,i)=ELE(3,i)-ELE(4,i)        
		  XYZ(4,i)=ELE(1,i)+ELE(4,i)
		  XYZ(5,i)=ELE(2,i)+ELE(4,i)
		  XYZ(6,i)=ELE(3,i)+ELE(4,i)
      end do

This is part of my code,I want to use openacc to accerlator it,but the subroutine here contains pointers,and I have tried many methods to inline it,but failed.Doyou have any advice?Thank you!
The following is the subroutine:


 subroutine obtain_DEM_kinematic(EDID,Pos,Dis,Vel,Acc, UnitE, UnitE1,UnitE2,UnitE3,RVel, RAcc)  
   implicit none
   integer, intent(in) :: EDID
   real( kind = RK ), optional :: Pos(3)
   real( kind = RK ), optional :: Dis(3)
   real( kind = RK ), optional :: Vel(3)
   real( kind = RK ), optional :: Acc(3)
   real( kind = RK ), optional :: UnitE(3),UnitE1(3),UnitE2(3),UnitE3(3)
   real( kind = RK ), optional :: RVel(3)
   real( kind = RK ), optional :: RAcc(3)
   integer :: total_dem
   character(len = MNL) :: msg, msg1, cdid

   total_dem = get_total_number_dem()

   if( EDID > total_dem .or. EDID <= 0 ) goto 555

   if( present(Pos) ) Pos = dem(edid) % kinematic % POS
   if( present(Dis) ) Dis = dem(edid) % kinematic % Dis
   if( present(Vel) ) Vel = dem(edid) % kinematic % Vel
   if( present(Acc) ) Acc = dem(edid) % kinematic % Acc
   if( present(UnitE)) UnitE = dem(edid) % kinematic % UnitE
   if( present(UnitE1)) UnitE1 = dem(edid) % kinematic % UnitE1
   if( present(UnitE2)) UnitE2 = dem(edid) % kinematic % UnitE2
   if( present(UnitE3)) UnitE3 = dem(edid) % kinematic % UnitE3
   if( present(RVel) ) RVel = dem(edid) % kinematic % RVel
   if( present(RAcc) ) RAcc = dem(edid) % kinematic % RAcc
  ! write(*,*)acc
   !pause

   return
  555 continue
   write(cdid, *) edid;   cdid = adjustl( cdid )
   msg = 'Subprogram Exception. Discrete element ID is greater than the total number of DEM '
   msg1 = 'Discrete Element ID: '//trim(cdid)
   call ErrorMSG( msg, msg1 )
   return
  end subroutine obtain_DEM_kinematic





function get_dem_radius( eid ) result( radius )
   implicit none
   integer, intent(in) :: eid
   real( kind = RK ) :: radius
   integer :: total_dem
   character(len = MNL) :: msg, msg1, msg2, cdid


   radius = -1.0

   total_dem = get_total_number_dem()
   if( eid > total_dem .or. eid <= 0 ) goto 555

   radius = DEM(eid) % radius

   return
  555 continue
   write(cdid, *) eid;   cdid = adjustl( cdid )
   msg = 'Subprogram Exception. In GET_DEM_RADIUS'
   msg1 = 'Discrete element ID is greater than the total number of DEM '
   msg2 = 'Discrete Element ID: '//trim(cdid)
   call ErrorMSG( msg, msg1 )
   return
  end function get_dem_radius
  • bigwbxu

Hi bigwbxu,

What errors are you getting? Can you write-up a reproducing example?

The Write statements and call to “ErrorMSG” in obtain_DEM_kinematic are problematic. Try conditionally compiling them:

  555 continue 
#if !defined(_OPENACC)
   write(cdid, *) edid;   cdid = adjustl( cdid ) 
   msg = 'Subprogram Exception. Discrete element ID is greater than the total number of DEM ' 
   msg1 = 'Discrete Element ID: '//trim(cdid) 
   call ErrorMSG( msg, msg1 )
#endif    
   return

Of course, you’ll not be catching your error condition if you do add this.

Also the use of the derived type may or may not be a problem (ex. “dem(edid)%kinematic%POS”) If the types are fixed size, then it’s ok. If any other members in these types are dynamically allocated, then you’ll need to wait until the OpenACC standards committee have defined how perform deep copies.

  • Mat

Hi Mat,

Firstly,thanks for your reply!

Then,the call to “ErrorMSG” is not important,it can be ingored.

Thirdly, I have checked my code and I find the derived type parameter is dynamically allocated.So I decided to change my code.

Lastly,I have write a part of codes and added openacc directive,but it cannot compile successfully,Could you suggest a workaround?

300 !$acc data present_or_copy(A,H,ELE),copyout(AA,HH)
301 !$acc region
302 !$acc do private(k,i,j)
303  do k=1,n_box
!$acc do private(ni), independent 
305     do i=1,A(k)-1
	    ni=H(i,k)
!$acc do private(nj),independent	    
308 	   do j=i+1,A(k)
	     nj=H(j,k)             
		   vector=0.0
       distance=0.0  
       distant=0.0	  
		 if(ni>0.and.ni<=n_particle)then 	  
		 if(nj>0.and.nj<=n_particle)then
!$acc do private(g,distance)		 
317 		     do g=1,3            
		       vector=ELE(g,ni)-ELE(g,nj)
		       distance=distance+vector*vector
		     end do
			   distant=sqrt(distance)
		     if(distant<=(ELE(4,ni)+ELE(4,nj))*1.001)then
		         if(ni/=nj)then        
			           if( ni < nj ) then	          
						        AA(ni)=AA(ni)+1 
		                HH(AA(ni),ni)=nj 
!$acc do private(v),independent	                 
331 						        do v=1,AA(ni)-1
					             if(HH(v,ni)==nj)then 
							           HH(AA(ni),ni)=0
							           AA(ni)=AA(ni)-1							   	  
					             end if
					          end do  	                
				         else
					          AA(nj)=AA(nj)+1 
		                HH(AA(nj),nj)=ni 
!$acc do private(v),independent
341 						        do v=1,AA(nj)-1
					             if(HH(v,nj)==ni)then 
							           HH(AA(nj),nj)=0
							           AA(nj)=AA(nj)-1							   	  
					             end if
					          end do                                  
                 end if                           
             end if
         end if
     end if               
     end if
     end do
    end do 
 end do  
!$acc end region
!$acc end data

The acclerator infprmation is
300,Generating copyout(hh(:,:))
Generating copyout(aa(:))
Generating present_or_copy(ele(:,:))
Generating present_or_copy(h(:,:))
Generating present_or_copy(a(:))
301, Generating present_or_copy(a(:))
Generating present_or_copyout(aa(:))
Generating present_or_copyout(hh(:,:))
Generating present_or_copy(ele(:,:))
Generating present_or_copy(h(:,:))
Generating NVIDIA code
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
Generating compute capability 3.0 binary
303, Loop carried dependence due to exposed use of ‘aa(:)’ prevents parallelization
Loop carried dependence due to exposed use of ‘hh(:,:)’ prevents parallelization
Accelerator kernel generated
317, !$acc loop vector(128) ! threadidx%x
331, !$acc loop vector(128) ! threadidx%x
341, !$acc loop vector(128) ! threadidx%x
305, Loop is parallelizable
308, Loop is parallelizable
317, Loop is parallelizable
331, Loop is parallelizable
341, Loop is parallelizable


-bigwbxu

You just need to add “independent” to the “do” directive at line 302. You have it in the other spots, just not the outer most loop.

You need use independent because the indices used in AA and HH and taken from a look-up array, so it has no way of knowing if all indices are discreet. Adding “independent” asserts to the compiler that they are.

  • Mat

Hi Mat,

I compilied the program again with “independent” added to the outer most loop .It can be compilied successfully.


But I want to get the values of array AA and HH.Once I use openacc directives ,the values are disorder and incorrect.

And, do you konw when will it show “call to cuStreamSynchronize returned error 702:Launch timeout”,I just use openacc,not cuda,and I don’t set anything about the stream.

-bigwbxu