17.10 seg faulting a code that worked in 17.9

Hi,

I recently installed PGI 17.10 and I can compile my code fine.

The first thing I notice is that I can no longer run my executable directly, but am forced to use mpiexec even when running on 1 core. If I try to run without mpiexec I get

PREDSCI-GPU2:02165] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 458
[PREDSCI-GPU2:02165] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 166

Next, when I try to run the code I get a seg fault:

[PREDSCI-GPU2:02194] *** Process received signal ***
[PREDSCI-GPU2:02194] Signal: Segmentation fault (11)
[PREDSCI-GPU2:02194] Signal code: Invalid permissions (2)
[PREDSCI-GPU2:02194] Failing at address: 0x7f0dbf8ec460

I tested the compiler on another OpenACC code I have and it worked fine with mpiexec.

The main difference in the codes is the present code uses manual deep copy for derived types. Prior to 17.10, I could only get the code to work by adding a “present(v)” in addition to “default(present)” as the default was not invoking a present on the type correctly (but was detecting the type’s members i.e. v%r(i,j,k)).

In 17.10, the compiler output seems to not be invoking "present"s for the types members anymore.

I thought that perhaps the added present clauses I needed in 17.9 were no longer necessary and were the problem but when I removed them all, although the compile output DOES show the type and its members being implicitly “present”, the code still seg faults.

Could this have to do with the OpenMPI library being updated? I do not think so since my other MPI+OpenACC code works.

The code is too large to reproduce here, and I unfortunately do not have the time to produce a sample small code that reproduces the problem (especially since I do not know which part of the code is seg faulting - I tried using pgdgb but the line is says makes no sense).

I was wondering if anyone else has come across this and what I can do to work around it.

Hi Ron,

In 17.10 we did start shipping OpenMPI 2.1.2 since 1.1.10 is no longer supported upstream by the OpenMPI developers.

As for the seg fault, nothing has changed w.r.t. the use of “present” or not, so something else is going on. Also, I’d expect an illegal instruction error on the device if it was a problem with present. A segv indicates that the problem is on the host.

Can you run your code through a debugger to see where the segv is occurring?

-Mat

Hi,

If I run the code with PGI_ACC_DEBUG turned on, the seg fault is happeneing after a routine that is called right before a routine that uses MPI sends and recieves.

I am leaning towards an issue with the “host_data use_device()” in the MPI routines when using derived types since my other MPI OpenACC code works fine and it has equivalent MPI routines but without derived types.

Maybe there is something going wrong with OpenMPI CUDA-aware when using derived type pointers?

!$acc host_data use_device(a%r,a%t)
      call MPI_Irecv (a%r(:,:,  1),lbuf3r,ntype_real,iproc_pm,tagr,     
     &           comm_all,req(1),ierr)

Hi again,

I relinked my openmpi to the 1.10.2 version and the code is still seg faulting in the same location…

I have confirmed it is in the MPI call that uses host_data.

If I manually copy the arrays, the routine works fine.

Also, sometimes if I set PGI_ACC_DEBUG=1, the code actually completes! (but not always)

With 17.4->17.9 I did not have this problem.

One more update:

I reinstalled 17.9 but linked the openmpi to the new 2.1.2 that came with 17.10 and the code works fine. This reaffirms that the updated openmpi is not the problem.

  • Ron

Hi Ron,

I’ve been trying to replicate the issue here with various test programs I have that use manual deep copy on derived types, but I’ve not had luck.

Can you send me the full source? I think you have my email, but if not I’ll send it to you on Slack.

Note that I’ll be at Supercomputing next week so it may not be something I can look at until I get back. I’ll try, but no guarantee.

Thanks,
Mat

Hi,

Unfortunately I cannot give out the code.

I will be at SC17 as well. Perhaps I can show you the code on my GPU-enabled laptop at the PGI booth?

  • Ron

Any updates on this?
(or a workaround?)

Hi,

Any updates on this problem?

  • Ron

Any updates on this problem?

I still cannot run my code using 17.10, only with 17.,9 and under.

Has the problem been identified and if so, will it be patched in 18.1?

Thanks,

  • Ron

Hi Ron,

Since you weren’t able to get us a reproducer nor were we able to determine what was wrong when we meet at SC17, I have not reported anything to engineering.

Hopefully a similar issue was found elsewhere. If so, your issue may be fixed in 18.1. Otherwise, we’ll need something that we can use to determine the problem.

-Mat

Hello,

I really would like this fixed so here is a reproducible code to show the problem (see below)

When I run the code using either one or two MPI ranks using PGI 17.9 it works fine (I used -ta=tesla:cc60,cuda9.0 on a TitanXP)

If I run the code using PGI 17.10 it seg faults using either 1 or 2 ranks. (I used -ta=tesla:cc50,cuda9.0 on a GeForce 970)

Here is the code:

c-------------------------------------------------
      module number_types

        use iso_fortran_env

        implicit none

        integer, parameter :: r8=REAL64

      end module
c-------------------------------------------------
      module types

        use number_types

        type :: vvec
          real(r8), dimension(:,:,:), allocatable :: r
          real(r8), dimension(:,:,:), allocatable :: t
          real(r8), dimension(:,:,:), allocatable :: p
        end type

      end module
c-------------------------------------------------
      module mpi_stuff

        implicit none

        include "mpif.h"

        integer :: nproc
        integer :: iproc
        integer :: iproc_pm
        integer :: iproc_pp

      end module
c-------------------------------------------------
      program test

      use number_types
      use types
      use mpi_stuff

      implicit none

      integer :: nr,nt,np
      integer :: i,j,k,ierr,tcheck

      type(vvec), target :: v


      call MPI_Init_thread (MPI_THREAD_FUNNELED,tcheck,ierr)
      call MPI_Comm_size (MPI_COMM_WORLD,nproc,ierr)
      call MPI_Comm_rank (MPI_COMM_WORLD,iproc,ierr)

      if (iproc.eq.0.and.nproc.gt.1) then
        iproc_pm=1
        iproc_pp=1
      else
        iproc_pm=0
        iproc_pp=0
      endif

      nr=6
      nt=6
      np=6

      allocate(v%r(nr,nt,np))
      allocate(v%t(nr,nt,np))
      allocate(v%p(nr,nt,np))

!$acc enter data create(v,v%r,v%t,v%p)

!$acc parallel loop collapse(3) default(present) present(v)
      do k=1,np/2
        do j=1,nt
          do i=1,nr
            v%r(i,j,k)=iproc
            v%t(i,j,k)=iproc
            v%p(i,j,k)=iproc
          enddo
        enddo
      enddo

!$acc parallel loop collapse(3) default(present) present(v)
      do k=np/2,np
        do j=1,nt
          do i=1,nr
            v%r(i,j,k)=iproc+2
            v%t(i,j,k)=iproc+2
            v%p(i,j,k)=iproc+2
          enddo
        enddo
      enddo


      call seam_vvec (v)


!$acc update self(v%r,v%t,v%p)


      if (iproc.eq.0) then
        print*, "vr(:,:,1):", v%r(:,:,1)
        print*, "vr(:,:,2):", v%r(:,:,2)
      end if

!$acc exit data delete(v%r,v%t,v%p,v)

      deallocate(v%r,v%t,v%p)

      call MPI_Finalize (ierr)

      end program
c-------------------------------------------------
      subroutine seam_vvec (v)

      use number_types
      use types
      use mpi_stuff

      implicit none

      type(vvec) :: v

      integer :: ierr
      integer :: tagr=0
      integer :: tagt=1
      integer :: tagp=2
      integer :: lbuf
      integer :: nr,nt,np
      integer :: req(12)

      nr=size(v%r,1)
      nt=size(v%r,2)
      np=size(v%r,3)

      lbuf=nr*nt

!$acc host_data use_device(v%r,v%t,v%p)

      call MPI_Irecv (v%r(:,:,  1),lbuf,MPI_REAL8,iproc_pm,tagr,
     &                MPI_COMM_WORLD,req(1),ierr)
      call MPI_Irecv (v%r(:,:,np),lbuf,MPI_REAL8,iproc_pp,tagr,
     &                MPI_COMM_WORLD,req(2),ierr)
      call MPI_Irecv (v%t(:,:,  1),lbuf,MPI_REAL8,iproc_pm,tagt,
     &                MPI_COMM_WORLD,req(3),ierr)
      call MPI_Irecv (v%t(:,:,np),lbuf,MPI_REAL8,iproc_pp,tagt,
     &                MPI_COMM_WORLD,req(4),ierr)
      call MPI_Irecv (v%p(:,:,  1),lbuf,MPI_REAL8,iproc_pm,tagp,
     &                MPI_COMM_WORLD,req(5),ierr)
      call MPI_Irecv (v%p(:,:,np),lbuf,MPI_REAL8,iproc_pp,tagp,
     &                MPI_COMM_WORLD,req(6),ierr)

      call MPI_Isend (v%r(:,:,np-1),lbuf,MPI_REAL8,iproc_pp,tagr,
     &                MPI_COMM_WORLD,req(7),ierr)
      call MPI_Isend (v%r(:,:,    2),lbuf,MPI_REAL8,iproc_pm,tagr,
     &                MPI_COMM_WORLD,req(8),ierr)
      call MPI_Isend (v%t(:,:,np-1),lbuf,MPI_REAL8,iproc_pp,tagt,
     &                MPI_COMM_WORLD,req(9),ierr)
      call MPI_Isend (v%t(:,:,    2),lbuf,MPI_REAL8,iproc_pm,tagt,
     &                MPI_COMM_WORLD,req(10),ierr)
      call MPI_Isend (v%p(:,:,np-1),lbuf,MPI_REAL8,iproc_pp,tagp,
     &                MPI_COMM_WORLD,req(11),ierr)
      call MPI_Isend (v%p(:,:,    2),lbuf,MPI_REAL8,iproc_pm,tagp,
     &                MPI_COMM_WORLD,req(12),ierr)

      call MPI_Waitall (12,req,MPI_STATUSES_IGNORE,ierr)

!$acc end host_data

      end subroutine

Hi,

I just installed PGI 18.1 and unfortunately this problem is still there :(

  • Ron

Hi Ron,

Thanks for the reproducer. I have added this as TPR#25243 and sent it to engineering for further investigation.

From what I can tell, it looks like V’s address is getting munged somehow when entering the host_data region.

I was only able to work around the error when I passed “r”, “t” , and “p” into “seam_vvec” instead of passing in “v”. (I needed also add an interface for “seam_vvec” so the assumed shape arrays are passed in correctly).

-Mat

OK Thanks!

Yeah, that work-around makes sense since we have another code (that has a very similar routine) which works fine but it is operating on a simple array, not an array in a derived type.

Unfortunately, the work-around would require too many CPU code changes, so I will stick to 17.9 for now.

  • Ron

Hi,

I just installed 18.4 and tested the bug and to my dismay it is still there :(.

This is really a problem for my development because I will now have to wait 6 months for the next community edition for a potential bug fix.
(I have a professional license but the computing centers I run on only have the community editions).

Is there an ETA on this bug fix?

  • Ron

Hi Ron,

Is there an ETA on this bug fix?

I’m checking with the developer where he’s at on this. Though, I wont be able to give you a firm ETA.

-Mat

Hi,

I just installed the new 18.7 compiler and unfortunately this bug has still not been fixed. The small sample code I gave you still seg faults.

This bug started in 17.10 but was not there in 17.9. I am surprised it has been this long without a fix. Can’t someone “diff” 17.9 and 17.10 to track this down?

  • Ron

Hi Ron,

I’m in China this week but just added a note to the TPR to see where the developer is at with this one. If I don’t get a response, when I get back I’ll ask our GPU computing manager if we can get the priority on this bumped up.

-Mat