Hello Mat .
With the last pgi/15.5 , I have tested the -ta:tesla:managed memory with openmpi1.8.5 CUDA-aware ( and also with mvapich2-1.a-gdr) .
Apparently activating the flag managed doesn’t inhibited the host_data clause which generate segmentation fault .
Here is an hello_manage.f90 example :
=> Process 0 send 1 to process 1 on GPU . Proc 1 write the result on CPU
program hello_managed
implicit none
include ‘mpif.h’
integer, parameter :: n=256
real , allocatable, dimension(:) :: send_buf,recv_buf
integer :: npe,mype,ierr
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, mype, ierr)
call mpi_comm_size(mpi_comm_world, npe, ierr)
if (npe.ne.2) STOP ‘run with 2 MPI task only’
allocate( send_buf(n),recv_buf(n) )
!$acc data create(send_buf,recv_buf)
!$acc kernels
send_buf=1.0
recv_buf=-999.0
!$acc end kernels
!$acc host_data use_device(send_buf,recv_buf)
if ( mype .eq. 0 ) then
call MPI_Send(send_buf,n,MPI_REAL,1,0,mpi_comm_world, ierr)
else
call MPI_Recv(recv_buf,n,MPI_REAL,0,0,mpi_comm_world,mpi_status_ignore, ierr)
endif
!$acc end host_data
if ( mype .eq. 1 ) then
!$acc update host(recv_buf(n:n))
print*,‘mype=’,mype,’ recv_buf(n) <must be 1> =',recv_buf(n)
end if
!$acc end data
call mpi_finalize(ierr)
end program hello_managed
=> Compiled without the managed flag , it work correctly <=> thank’s to MPI CUDA-aware
mpif90 -ta=tesla:cuda6.5 hellompi_managed.f90 -o hellompi_tesla
mpirun -np 2 hellompi_tesla
mype= 1 recv_buf(n) <must be 1> = > 1.000000
=> With the managed flag the code crash
mpif90 -Mcuda -ta=tesla:cuda6.5,> managed > hellompi_managed.f90 -o hellompi_managed
mpirun -np 2 hellompi_managed
mpirun noticed that process rank 0 with PID 6225 on node n370 exited on > signal 11 (Segmentation fault)> .
=> Removing the host_data make the code run correctly with the managed flag :
mpif90 -Mcuda -ta=tesla:cuda6.5,managed hellompi_managed.f90 -o hellompi_managed_no_hostdata
mpirun -np 2 hellompi_managed_no_hostdata
mype= 1 recv_buf(n) <must be 1> = > 1.000000
=> Last remark, the update clause is not deactivated to ( by create clause are ) , as shown by the -Minfo=acc
mpif90 -Mcuda -ta=tesla:cuda6.5,> managed > -Minfo=acc hellompi_managed.f90 -o hellompi_managed
hello_managed:
21, Loop is parallelizable
Accelerator kernel generated
21, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
34, > Generating update host> (recv_buf(256)
Bye Juan .