same code intel compiler compiles but portland doesn't..why?

I am using the portland group compiler for a 3-d problem and it doesn’t compile at all if I go above a certain resolution, though that resolution doesn’t use the whole memory of the computer. The same code with same resolution gets compiled with the intel compiler. Even when i use both processors of that computer, it gives “segmentation fault” in portland group compiler. Can anybody please help me out in this.

I think I need a bit more information before I can give you an answer. If I understand you correctly, there are two issues. The first is that the program won’t compile at a higher optimizations and the second is that when it does compile, the program seg faults?

While seg faults happen for a variety of reasons, one major cause is that the stack size is too small. You might try increasing your stack size using ‘unlimit’ (Assuming your on a linux system). If this doesn’t work, you’d need to use pgdbg to determine where the seg faults occurs and try and diagnosis from there.

As for compilation issues, I would need to know what type of system, the compiler, and optimizations your using, and the error message generated to have any hope of determining what’s wrong. If the code is small, or a small sample of the code that exhibits the same problem, please post this as well.

The more information you can provide, the more likely others can help.

  • Mat

ya im using linux system and i want to run this program in one computer which has got 2 processors in it…and my code is related to 3-d mantle convection…so when i try to increase the resolution of the box it doesn’t compile…e.g. when i use 73x73x73 resolution of the box, it compiles and run properly, but if i put 161x161x161 grid points resolution, then it doesnt compile…the messages after not compiling is given below:

solve_free.o(.text+0x650): In function por_solve_x_': : relocation truncated to fit: R_X86_64_PC32 itrtot_ solve_free.o(.text+0xc2e): In function por_solve_y_’:
: relocation truncated to fit: R_X86_64_PC32 itrtot_
read_write_a.o(.text+0x19a): In function reada_': : relocation truncated to fit: R_X86_64_PC32 rdaerr_ read_write_a.o(.text+0x3e2): In function reada_’:
: relocation truncated to fit: R_X86_64_PC32 rdaerr_
read_write_a.o(.text+0x3f5): In function reada_': : relocation truncated to fit: R_X86_64_PC32 rdaerr_ read_write_a.o(.text+0x7f2): In function iread_’:
: relocation truncated to fit: R_X86_64_PC32 rdaerr_
read_write_a.o(.text+0xb62): In function isrd_': : relocation truncated to fit: R_X86_64_PC32 rdaerr_ /usr/local/pgi/linux86-64/5.1/lib/libpgmp.a(barrier.o)(.text+0x69): In function _mp_get_parpar’:
: relocation truncated to fit: R_X86_64_32S _mp_parpar
/usr/local/pgi/linux86-64/5.1/lib/libpgmp.a(barrier.o)(.text+0x347): In function _mp_lcpu2': : relocation truncated to fit: R_X86_64_32S _mp_parpar /usr/local/pgi/linux86-64/5.1/lib/libpgmp.a(barrier.o)(.text+0x382): In function _mp_ncpus2’:
: relocation truncated to fit: R_X86_64_32S _mp_parpar
/usr/local/pgi/linux86-64/5.1/lib/libpgmp.a(barrier.o)(.text+0x3e5): In function `_mp_barrier2’:
: additional relocation overflows omitted from the output[/i][/b]

here read_write_a.f and solve_free.f are 2 codes which don’t have any errors in it .

Now another thing is when i run the same code in both processors of the same computer, it runs only at say 73x73x73 grid point resolution, whether when i choose only one processor to run it, then i can go upto 161x161x161 grid points…in both processor case if i put more than 73x73x73 resolution, then it gives me segmentation fault when the codes run.

I am confused , because i’m not having this kind of problem when i am using the intel compiler. I am very much sure that this is not a memory problem.
[/u]

Hi,

What is the data type of the array(s) that you are trying to allocate 161x161x161? It looks like you may be exceeding the 2GB size limit of the small memory model data segment. Are you using the 64-bit compilers? If so, you may want to try compiling with -mcmodel=medium. See section 3.5 in the latest release notes for PGI Workstation 5.2:

http://www.pgroup.com/doc/pgiwsrn.pdf

Assuming this allows you to compile and link your program, you may need to increase your stack size as mentioned in another reply above. See section 3.5.3 in the release notes for an example.

-Mark

FYI, the relocation messages are just warnings and shouldn’t cause the compilation to fail. Are there any other messages being emitted? Also, what optimization flags are you using?

-Mat

Just to clarify my response above:

These messages are just warnings, but they do indicate that you may be exceeding the small memory model (which includes your total data and program text space). Try compiling and linking with the switch -mcmodel=medium. Assuming that you compile and link successfully (e.g., these warnings go away), run your program. If you get a seg fault, then you may need to increase your system stack (see the posts above).

-Mark

thank you for the help about the compiler problem…with the -mcmodel=medium the compiler is now working properly with higher resolution…but still i have problem when im using the -mp thing in the compiler to use the both processors in the same computer…using that i can’t run the code with a very high resolurion and get the segmentation fault though the memory is only half filled up… but i can run in a lower resolution…when i use only one processor, i can run with quite higher resolution and can use almost the full of the 4GB memory…im working on parallelizing the code though using MPI…but still i wonder why the -mp or -openmp is doing something like that…thanks once again to all of u for helping me out in that one…

im here putting one raw copy of the part of the code in which the segmentation fault occurs…in low resolution situation it doesn’t give any problem, but in higher resolution only when i run in 2 processors, it gives segmentation fault right before “call clcvel” and after the end of the iteration loop on line 106…it doesn’t make any sense at all to me…

       subroutine por_solve_x(ntime,fparm1)
c***********************************************************************
c   Subroutine to solve the poison equation for the stream function.
c***********************************************************************
       include 'por_common'

       common/itrtot/ittotal
       common/blkmud/wrkmud(length),fparm(8),iparm(23),mgopt(4)
     &   ,cpumud,cpuin1,cpuin2,nvcycle,ncallmud,cpumud24,
     &    iwk(length3i),wrkmudomx(length),iwkomx(length3i)
	dimension fparm1(8)


c	write(6,*)'in solver'
c       Set the percent tolerence of the convergence test for
c       the streamfunction vorticity pair and the maximum
c       number of itterations allowed.
        tol=300.
        itmax=500
        fnxnzbig=nx*ny*nz

c	write(6,*)'entering iteration'

c Start the loop


        do 1000 itbig=1, itmax

	do i=1, nx
	 do j=1, ny
	  do k=1, nz
	   omxtemp(i,j,k)=omx(i,j,k)
	   psixtemp(i,j,k)=psix(i,j,k)
	  enddo
	 enddo
	enddo

c Store old array
           do k=1, nz
	    do j=1, ny
             do i=1, nx
              wk3dx(i,j,k)=psix(i,j,k)
             enddo
           enddo
	  enddo

c	write(6,*)'entering solver'


c  **     Solve the stream function equation. **
c

c	write(6,*)'test1'

	call slveqomx(ntime,fparm1)
	call slveqpsix(ntime)

c	write(6,*)'test6'


c       Set iparm(1) to indicate noninitial calls to muh1

        iparm(1)=1

        avg=0.0
        dpmax=0.0
        imax=1
        jmax=1
	kmax=1

        do 30 i=1,nx
	  do 30 j=1,ny
            do 30 k=1,nz
                avg=avg+abs(psix(i,j,k))
                dp=abs(psix(i,j,k)-wk3dx(i,j,k))
                if(dp.gt.dpmax) then
                 dpmax=dp
                 imax=i
                 jmax=j
	         kmax=k
                 ireg=1
                endif
30      continue

        avg=avg/fnxnzbig
c        write(6,*)'avg',avg
        percnt=100.0*dpmax/avg
        write(6,*)itbig,'percnt',percnt

c       Decide whether or not stream function is satisfactory.
        if(percnt.lt.tol)goto 1010

	a=1.0

	do i=1, nx
	 do j=1, ny
	  do k=1, nz
	   omx(i,j,k)=a*omx(i,j,k)+(1.0-a)*omxtemp(i,j,k)
	   psix(i,j,k)=a*psix(i,j,k)+(1.0-a)*psixtemp(i,j,k)
	  enddo
	 enddo
	enddo

 1000        continue

 1010        continue


c
c       Calculate the velocity from the stream function.

	write(6,*)'clcvel'
	call clcvel

c	write(6,*)'dump'

         return
         end


       subroutine slveqomx(ntime,fparm1)
c***********************************************************************
c   Subroutine to solve the Poisson equation for the stream function
c***********************************************************************
       external coef,bndyc
       include 'por_common'
       common/blkmud/wrkmud(length),fparm(8),iparm(23),mgopt(4)
     &  ,cpumud,cpuin1,cpuin2,nvcycle,ncallmud,cpumud24,
     &   iwk(length3i),wrkmudomx(length),iwkomx(length3i)
c         COMMON/CONSTS/Ra,a0tc,delt
         COMMON/BLKDL/DLX,DLY,DLZ
	 dimension fparm1(8)
	 data ifirst/0/
	 save ifirst

c	test=temp(73,73)
        pi=acos(-1.)
        ierr=0
c   Assign the forcing term for the stream function equation.
        do k=1, nz
	 do i=1, nx
          do j=1, ny
          wrk1(j)=temp(i,j,k)
          enddo
          call der1(wrk1,wrk2,ny,dly)
	  do j=1, ny
             frhs(i,j,k)=-wrk2(j)
          enddo
         enddo
	enddo

c	test=om(73,73)
c
c   Impose boundary conditions on the stream function.
c
c   Om along the horizontal faces is zero (no mass flux)

	do i=1, nx
	 do j=1, ny
           omx(i,j,1)=0.0
           omx(i,j,nz)=0.0
	 enddo
	enddo

c   domdx is zero along the X faces and om is zero along Y faces

	do j=1, ny
	 do k=1, nz
	   omx(1,j,k)=(48.0*omx(2,j,k)-36.0*omx(3,j,k)+16.0*
     &     omx(4,j,k)-3.0*omx(5,j,k))/25.0
	   omx(nx,j,k)=(48.0*omx(nx-1,j,k)-36.0*omx(nx-2,j,k)+16.0*
     &     omx(nx-3,j,k)-3.0*omx(nx-4,j,k))/25.0
	 enddo
	enddo

	do i=1, nx
	 do k=1, nz
	  omx(i,1,k)=0.0
	  omx(i,ny,k)=0.0
	 enddo
	enddo

c   On noninitial calls to mud3 the work array wrkmud must be
c   updated from the file attached to unit iueq2.
      ncallmud=ncallmud+1
c      call cputme(cpuin)
c            call cputme(cpuin)


c	write(6,*)'test2'

	if(ifirst .eq. 0)then
	iparm(1)=0
	call muh3(iparm,fparm,wrkmudomx,iwkomx,
     &                    coef,bndyc
     &                   ,frhs,omx,mgopt,ierr)

	ifirst=1
	endif

	iparm(1)=1

	call muh3(iparm,fparm,wrkmudomx,iwkomx,
     &                    coef,bndyc
     &                   ,frhs,omx,mgopt,ierr)
	call muh34(wrkmudomx,iwkomx,omx,ierr)

c	write(6,*)'test3'




       nvcycle=nvcycle+iparm(23)
c


      return
      end

             subroutine slveqpsix(ntime)
c***********************************************************************
c   Subroutine to solve the Poisson equation for the stream function
c***********************************************************************
       external coef,bndyc
       include 'por_common'
       common/blkmud/wrkmud(length),fparm(8),iparm(23),mgopt(4)
     &  ,cpumud,cpuin1,cpuin2,nvcycle,ncallmud,cpumud24,
     &   iwk(length3i),wrkmudpsix(length),iwkpsix(length3i)
c         COMMON/CONSTS/Ra,a0tc,delt
         COMMON/BLKDL/DLX,DLY,DLZ
	 data ifirst/0/
	 save ifirst

        pi=acos(-1.)
        ierr=0
c   Assign the forcing term for the stream function equation.
	do k=1, nz
	 do j=1, ny
	   do i=1, nx
             frhs(i,j,k)=-omx(i,j,k)
	    enddo
	 enddo
	enddo
c
c   Impose boundary conditions on the stream function.
c
c   Psi along the horizontal faces is zero (no mass flux)
	do i=1, nx
	 do j=1, ny
          psix(i,j,1)=0.0
          psix(i,j,nz)=0.0
	 enddo
	enddo

c   dpsixdx is zero along the X faces and psi is zero along Y faces.

	do j=1, ny
	 do k=1, nz
	  psix(1,j,k)=(48.0*psix(2,j,k)-36.0*psix(3,j,k)+16.0*
     &     psix(4,j,k)-3.0*psix(5,j,k))/25.0
	  psix(nx,j,k)=(48.0*psix(nx-1,j,k)-36.0*psix(nx-2,j,k)+16.0*
     &     psix(nx-3,j,k)-3.0*psix(nx-4,j,k))/25.0
	 enddo
	enddo

	do i=1, nx
         do k=1, nz
          psix(i,1,k)=0.0
          psix(i,ny,k)=0.0
         enddo
        enddo


c   On noninitial calls to mud3 the work array wrkmud must be
c   updated from the file attached to unit iueq2.
      ncallmud=ncallmud+1

c	write(6,*)'test4'

	if(ifirst .eq. 0)then
	iparm(1)=0
	call muh3(iparm,fparm,wrkmudpsix,iwkpsix,
     &                    coef,bndyc
     &                   ,frhs,psix,mgopt,ierr)
	ifirst=1
	endif

	iparm(1)=1

	call muh3(iparm,fparm,wrkmudpsix,iwkpsix,
     &                    coef,bndyc
     &                   ,frhs,psix,mgopt,ierr)
	call muh34(wrkmudpsix,iwkpsix,psix,ierr)

c	write(6,*)'test5'

        nvcycle=nvcycle+iparm(23)
c
c   After the initial call to mud3 the work array wrkmud must be
c   saved in the file attached to unit iueq2.

      return
      end



       subroutine coef(x,y,z,cxx,cyy,czz,cx,cy,cz,ce)
c***********************************************************************
c   Subroutine to provide the coefficients for the elliptic pde
c   that must be inverted to determine the streamfunction,
c   at any grid point (R,T). This subroutine is used by mud3.
c***********************************************************************
c Pix and Piy are the permeabilities in the x and y direction if the medium
c is anisotropic Piz is assumed to be 1.
       parameter (pix=1.0,piy=1.0)

       cxx=1.0
       cyy=1.0
       czz=1.0
       cx=0.0
       cy=0.0
       cz=0.0
       ce=0.0

      return
      
      end
      
c      subroutine coefomx(x,y,z,cxx,cyy,czz,cx,cy,cz,ce)
c***********************************************************************
c   Subroutine to provide the coefficients for the elliptic pde
c   that must be inverted to determine the streamfunction,
c   at any grid point (R,T). This subroutine is used by mud3.
c***********************************************************************
c Pix and Piy are the permeabilities in the x and y direction if the medium
c is anisotropic Piz is assumed to be 1.

c       parameter (pix=1.0,piy=1.0)

c       cxx=1.0
c      czz=1.0
c       cyy=1.0
c       cx=0.0
c       cy=0.0
c       cz=0.0
c       ce=0.0
c       
c	return
c      
c	end


c	subroutine coefomy(x,y,z,cxx,cyy,czz,cx,cy,cz,ce)
c***********************************************************************
c   Subroutine to provide the coefficients for the elliptic pde
c   that must be inverted to determine the streamfunction,
c   at any grid point (R,T). This subroutine is used by mud3.
c***********************************************************************
c Pix and Piy are the permeabilities in the x and y direction if the medium
c is anisotropic Piz is assumed to be 1.

c       parameter (pix=1.0,piy=1.0)

c       cxx=1.0
c       czz=1.0
c       cyy=1.0
c       cx=0.0
c       cy=0.0
c       cz=0.0
c       ce=0.0

c     return

c      end

      
      subroutine bndyc(kbdy,xory,yorz,alpha,gbdy)
c***********************************************************************
c   bndyc is used by mud3 when non-Dirichlet boundary conditions
c   are called for. (not used here)
c***********************************************************************
      return
      end

	subroutine clcvel
c***********************************************************************
c   Subroutine to determine a fourth order accurate approximation
c   to the velocity field from the current stream function.
c***********************************************************************
	include 'por_common'

	COMMON/BLKDL/DLX,DLY,DLZ
	dimension uza(nx,ny,nz),uzb(nx,ny,nz)
C
C   Assign the interior horizontal velocity.

	write(6,*)'test1'

	do j=1, ny
	 do i=2, nx-1
	  do k=1, nz
            wrk4(k)=psiy(i,j,k)
	  enddo

        CALL DER1(WRK4,WRK5,NZ,DLZ)
        
	  do  k=1, nz
            ux(i,j,k)=wrk5(k)
	  enddo

	 enddo

	enddo

	write(6,*)'test2'

	do i=1, nx
	 do j=2, ny-1
	  do k=1, nz
	    wrk4(k)=psix(i,j,k)
	  enddo

	  CALL DER1(WRK4,WRK5,NZ,DLZ)

	  do k=1, nz
	   uy(i,j,k)=wrk4(k)
	  enddo

	 enddo

	enddo

	write(6,*)'test3'

C
C
C   Calculate the vertical velocity field.
	do j=1, ny
	 do k=2, nz-1
	  do i=1, nx
         wrk4(i)=psiy(i,j,k)
        enddo
        
        CALL DER1(WRK4,WRK5,NX,DLX)

	  do i=1, nx
            uza(i,j,k)=-wrk5(i)
	  enddo

	 enddo

	enddo

	write(6,*)'test4'

	do i=1, nx
         do k=2, nz-1
          do j=1, ny
         wrk6(j)=psix(i,j,k)
         enddo

        CALL DER1(WRK6,WRK7,NY,DLY)

          do j=1, ny
            uzb(i,j,k)=-wrk7(j)
          enddo

         enddo

	enddo

	write(6,*)'test5'

	do i=1, nx
	 do j=1, ny
	  do k=2, nz-1
	   uz(i,j,k)=uza(i,j,k)+uzb(i,j,k)
	  enddo
	 enddo
	enddo

	write(6,*)'test6'

c  ux=0,duydx=0 and duzdx=0 on x faces

	do j=1, ny
	 do k=1, nz
	  ux(1,j,k)=0.0
	  ux(nx,j,k)=0.0
	 enddo
	enddo

	write(6,*)'test7'

c  uy=0, duxdy=0 and duzdy=0 on y faces

	do i=1, nx
	 do k=1, nz
          uy(i,1,k)=0.0
          uy(i,ny,k)=0.0
         enddo
        enddo

	write(6,*)'test8'


c   uz=0,duxdz=0 and duydz=0 along the horizontal boundaries.

	do i=1, nx
	 do j=1, ny
	  uz(i,j,1)=0.0
          uz(i,j,nz)=0.0
	 enddo
	enddo

	write(6,*)'test9'
c
      return
      end

[/code]

i would like to add that the point in the code where it gives seg fault as described above is within the subroutine por_solve_x(ntime,fparm1) right after the 1010 loop ends and right before the call clcvel almost at the end of that subroutine…

Looking through the code I don’t see anything obvious. Typically however, seg faults that occur at a call are the result of a stack overflow error. Also, when using “-mp” all your local variables are placed on the stack, thus increasing the stack size.

When you ran this program did you explicitly set the stack size to a very large value i.e. ‘unlimit’? If you did, then you might be up against your system’s hard limit (unlimited stack size really does have a limit) and you may be stuck.

I noticed that the code you sent did not include any OpenMP directives so I’m a bit confused as to why your using “-mp”. You must insert OpenMP directives in your code in order for the compiler to know what you want to parallelize. Our compilers do have an option called “-Mconcur” which will auto-parallelize portions of your code. However, you might still have stack issues since “-Mconcur” also puts local variables on the stack. Be sure to set the “NCPUS” enviroment variable to 2. “NCPUS” lets your program know how many processors are available for it to execute on.

For more information about OpenMP please refer to

Also, chapter 5 of our users guide gives some more information about openmp directives.