Fortran compilation problem.

I’m trying to make our fortran program to be accelerated.
And I got following compiling massages with no executable file.
I tried to search for this, but could get any clue.
Sorry for making such a blur question, but I don’t know how to fix this.

pgfortran memb.iot.para.f -ta=nvidia,time -Minfo
accelar:
1168, Generating copyin(zj(:))
Generating copy(fzvm(:,1:iatom-1))
Generating copy(fyvm(:,1:iatom-1))
Generating copy(fxvm(:,1:iatom-1))
Generating copy(vvdwj(:,1:iatom-1))
Generating copyin(rvdwj(:))
Generating copyin(evdwj(:))
Generating copyin(nclosej(1:iatom-1))
Generating copyin(linkj(1:iatom-1,:))
Generating copyin(n13j(1:iatom-1))
Generating copyin(xj(:))
Generating copyin(yj(:))
Generating copyin(index3j(1:iatom-1,:))
1169, Accelerator kernel generated
1169, !$acc do parallel
Non-stride-1 accesses for array ‘evdwj’
Non-stride-1 accesses for array ‘rvdwj’
Non-stride-1 accesses for array ‘zj’
Non-stride-1 accesses for array ‘yj’
Non-stride-1 accesses for array ‘xj’
Non-stride-1 accesses for array ‘n13j’
Non-stride-1 accesses for array ‘nclosej’
1170, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1173, Accelerator restriction: induction variable live-out from loop: i
Inner sequential loop scheduled on accelerator
1174, Accelerator restriction: induction variable live-out from loop: i
1175, Accelerator restriction: induction variable live-out from loop: j
Accelerator restriction: scalar variable live-out from loop: joke
Loop carried scalar dependence for ‘joke’ at line 1175
Scalar last value needed after loop for ‘joke’ at line 1180
Scalar last value needed after loop for ‘joke’ at line 1183
1178, Accelerator restriction: induction variable live-out from loop: i
Inner sequential loop scheduled on accelerator
1179, Accelerator restriction: induction variable live-out from loop: i
1180, Accelerator restriction: induction variable live-out from loop: j
Accelerator restriction: scalar variable live-out from loop: joke
Loop carried scalar dependence for ‘joke’ at line 1180
Scalar last value needed after loop for ‘joke’ at line 1183
1185, Accelerator restriction: scalar variable live-out from loop: cute
1186, Accelerator restriction: scalar variable live-out from loop: cut1
1189, Accelerator restriction: induction variable live-out from loop: j
Accelerator restriction: induction variable live-out from loop: i
1191, Accelerator restriction: induction variable live-out from loop: j
Accelerator restriction: induction variable live-out from loop: i
1193, Accelerator restriction: induction variable live-out from loop: j
Accelerator restriction: induction variable live-out from loop: i
1196, Accelerator restriction: scalar variable live-out from loop: rr
1200, Accelerator restriction: induction variable live-out from loop: i
1201, Accelerator restriction: induction variable live-out from loop: j
1203, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1205, Loop carried scalar dependence for ‘sw0’ at line 1228
Loop carried scalar dependence for ‘sw0’ at line 1230
Loop carried scalar dependence for ‘sw0’ at line 1231
Loop carried scalar dependence for ‘sw0’ at line 1232
1206, Loop carried scalar dependence for ‘sw0’ at line 1228
Loop carried scalar dependence for ‘sw0’ at line 1230
Loop carried scalar dependence for ‘sw0’ at line 1231
Loop carried scalar dependence for ‘sw0’ at line 1232
1208, Loop carried scalar dependence for ‘sw0’ at line 1228
Loop carried scalar dependence for ‘sw0’ at line 1230
Loop carried scalar dependence for ‘sw0’ at line 1231
Loop carried scalar dependence for ‘sw0’ at line 1232
1212, Loop carried scalar dependence for ‘dsw’ at line 1230
Loop carried scalar dependence for ‘dsw’ at line 1231
Loop carried scalar dependence for ‘dsw’ at line 1232
1213, Loop carried scalar dependence for ‘dsw’ at line 1230
Loop carried scalar dependence for ‘dsw’ at line 1231
Loop carried scalar dependence for ‘dsw’ at line 1232
1215, Loop carried scalar dependence for ‘dsw’ at line 1230
Loop carried scalar dependence for ‘dsw’ at line 1231
Loop carried scalar dependence for ‘dsw’ at line 1232
1228, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1230, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1231, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1232, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1281, Accelerator restriction: induction variable live-out from loop: j
1282, Accelerator restriction: induction variable live-out from loop: i
1286, Invariant assignments hoisted out of loop
1356, Accelerator restriction: induction variable live-out from loop: i
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function __hpf_myprocnum': initpar.c:(.text+0x2): relocation truncated to fit: R_X86_64_PC32 against symbol __hpf_lcpu’ defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function __hpf_ncpus': initpar.c:(.text+0x12): relocation truncated to fit: R_X86_64_PC32 against symbol __hpf_tcpus’ defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function __hpf_getioproc': initpar.c:(.text+0x22): relocation truncated to fit: R_X86_64_PC32 against symbol __hpf_ioproc’ defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function __hpf_is_ioproc': initpar.c:(.text+0x32): relocation truncated to fit: R_X86_64_PC32 against symbol __hpf_ioproc’ defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
initpar.c:(.text+0x38): relocation truncated to fit: R_X86_64_PC32 against symbol __hpf_lcpu' defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o) /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function __hpf_abort’:
initpar.c:(.text+0x5f): relocation truncated to fit: R_X86_64_PC32 against symbol __hpf_lcpu' defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o) /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function __hpf_abortp’:
initpar.c:(.text+0xeb): relocation truncated to fit: R_X86_64_PC32 against symbol __hpf_lcpu' defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o) /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function __hpf_initarg’:
initpar.c:(.text+0x127): relocation truncated to fit: R_X86_64_PC32 against .bss' initpar.c:(.text+0x151): relocation truncated to fit: R_X86_64_PC32 against .bss’
initpar.c:(.text+0x17b): relocation truncated to fit: R_X86_64_PC32 against `.bss’
initpar.c:(.text+0x18b): additional relocation overflows omitted from the output

The following lines are the code in acc region.

      call acc_init( acc_device_nvidia ) 
!$acc region do parallel, vector(256)
	do i = 1, iatom - 1
	do j = i + 1, iatom

		joke = 0
		do k = 1, nclosej(i)
		k1 = linkj(i,k)
		if( j.eq.k1 ) joke = 1
		enddo

		do k = 1, n13j(i)
		k1 = index3j(i,k)
		if( j.eq.k1 ) joke = 1
		enddo

		if( joke.eq.0 ) then

		cute  = 20.0/sunit
	cut1  = 0.98*cute
  	econs = 332.0/(eunit*sunit)

	xx = (xj(i) - xj(j)) - 
     s	float(int( (xj(i)-xj(j))*2.0/boxa) ) *boxa
	yy = (yj(i) - yj(j)) - 
     s	float(int( (yj(i)-yj(j))*2.0/boxb) ) *boxb
	zz = (zj(i) - zj(j)) - 
     s  float(int( (zj(i)-zj(j))*2.0/boxc) ) *boxc
	
	rr = sqrt( xx**2 + yy**2 + zz**2 )
 	if( rr.le.cutoff*1.01 ) then


	rii = rvdwj(i)
		rjj = rvdwj(j)
	rij = (rii + rjj)/2.0
	eij = sqrt( evdwj(i)*evdwj(j) )

	if(rr.lt.cutoff) sw0=1.0
	if(rr.gt.cutoff*1.01) sw0=0.0
	if(rr.ge.cutoff.and.rr.le.cutoff*1.01) then
	sw0=(cutoff*1.01+2.0*rr-3.0*cutoff)
     s      	*(cutoff*1.01-rr)**2/((0.01*cutoff)**3)
	endif
		
	if(rr.lt.cutoff) dsw=0.0
	if(rr.gt.cutoff*1.01) dsw=0.0
	if(rr.ge.cutoff.and.rr.le.cutoff*1.01) then
	dsw=6.0*(cutoff-rr)*(cutoff*1.01-rr)/((0.01*cutoff)**3)
	endif

	aij = rij**6
	bij = rij**3
      rr6 = rr**6
      rr12 = rr6 **2
      rr13 = rr12 * rr
      rr7 = rr6 * rr
	vpart = 4.0*eij*(aij/(rr12) - bij/(rr6))
	dpart = 4.0*eij*( -12.0*aij/(rr13) + 6.0*bij/(rr7) )

	vdwij = vpart*sw0
	vvdwj(j,i)  = vpart *sw0
C
	fxvm(j,i) = -( dpart*sw0 + vpart*dsw )*(xx/rr)
	fyvm(j,i) = -( dpart*sw0 + vpart*dsw )*(yy/rr)
	fzvm(j,i) = -( dpart*sw0 + vpart*dsw )*(zz/rr)

	! ax(i1) = ax(i1) + fxv
	! ay(i1) = ay(i1) + fyv
	! az(i1) = az(i1) + fzv
	!	ax(j1) = ax(j1) - fxv
	!	ay(j1) = ay(j1) - fyv
	!	az(j1) = az(j1) - fzv

      endif

	enddo	
	enddo
!$acc end region

Hi tiomiya,

I see a few issues.

First, I would recommend using version 10.1 or higher. 10.0 unfortunately has a bug where if statements were being ignored. It’s not the cause of these errors, but will effect your code once it compiles.

Secondly, you’re using a triangular loop. GPUs only support rectangular loops. Hence you will need to either make the “j” loop sequential, or make the “j” loop rectangular and then use an if statement to skip the lower part of the triangle.

For example:

!$acc region 
!$acc do kernel
   do i = 1, iatom - 1
   do j = i + 1, iatom
...

or

!$acc region 
!$acc do parallel, vector(256)
   do i = 1, iatom - 1
!$acc do kernel
   do j = 2, iatom
     if (j.gt.i) then
      ... body of loop
     endif

For the “Loop carried scalar dependence” and “induction variable live-out”, errors I believe these should go away once the “j” is parallelizable or made sequential. Granted, I can’t be sure since there isn’t enough of the code for me to be able to compile it.

For the “relocation truncated” errors, does this code compile as is without the “-ta=nvidia” flag? These typically occur when the code uses more then 2GB of static data (common blocks or static arrays) and need to use the Medium Memory Model (-mcmodel=medium). If this is the case, then you will need to reduce the size of your static variables since GPU can’t yet be used with the Medium Memory Model.

If the code compiles without “-mcmodel=medium”, please send the full source to PGI Customer Service (trs@pgroup.com) since we’'ll to investigate this further.

Hope this helps,
Mat