I have a code that works fine on x86_64 based multi-GPU (P100) nodes, however when I try it on IBM Power multi-GPU node systems it crashes. I could track the error down to the swapping between processors, in particular to access issues of a data array of fortran types. Considering that it works on x86_64 systems I was wondering whether this feature is implemented in the POWER version of the compiler, or whether there is something else wrong.
The code is written in FORTRAN and uses MPI and OpenACC and some CUDA kernels.
I tried to extract the relevant sections for the code that fails below. This is not a working example but I hope it makes my problem more clear. The write statement in the last loop works on x86_64 and gives a segfault on IBM POWER.
! allocate the arrays used for swapping allocate(swaps(intf_num_reg)) do intf=1,intf_num_reg .... allocate(SWAPS_IN(intf)%swap_in(ib:ie,jb:je,kpoints,nv_add)) ..... allocate(SWAPS_OUT(intf)%swap_out(ib:ie,jb:je,kpoints & ,nv_add)) end do ! allocate the GPU arrays !$acc enter data copyin(swaps) do intf=1,intf_num_reg !$acc enter data create(SWAPS_IN(intf)%swap_in) !$acc enter data create(SWAPS_OUT(intf)%swap_out) end do ! set receive buffers .............. ! pack data on GPU !$acc kernels present(a,SWAPS_OUT,SWAPS_OUT(intf)%swap_out) !$acc+ async(intf) !$acc loop independent collapse(2) do n=1,nv do k=k1,k2 !$acc loop independent collapse(2) do j=jb,je,js do i=ib,ie,is ! accessing SWAPS_OUT(intf)%swap_out(i,j,k,n) here fails for write statements !!!!!!!!!!!!! write(*,*) SWAPS_OUT(intf)%swap_out(1,1,1,:) SWAPS_OUT(intf)%swap_out(i,j,k,n)=a(i,j,k,n) end do end do end do end do !$acc end kernels