Hi Mat I Have some questions about autocompare flag when i using MPI functions.
My enviroments.
nvhpc/22.11
use Makefile flags
ACC = -fast -acc -Minfo=accel -ta=tesla:autocompare
I know how to i use autocompare function.
but when i using MPI functions it seems malfunction.
This is my code.
subroutine mpi_subdomain_ghostcell_update(Value_sub)
use mpi_topology!, only : comm_1d_x1, comm_1d_x2, comm_1d_x3
implicit none
double precision, dimension(0:n1sub, 0:n2sub, 0:n3sub), intent(inout) :: Value_sub
integer :: ierr
integer :: request(4),rr(2)
! Update the ghostcells in the x-direction using derived datatypes and subcommunicator.
!$acc host_data use_device(Value_sub)
call MPI_Isend(Value_sub,1, ddtype_sendto_E , comm_1d_x1%east_rank, 111, comm_1d_x1%mpi_comm, request(1), ierr)
call MPI_Irecv(Value_sub,1, ddtype_recvfrom_W, comm_1d_x1%west_rank, 111, comm_1d_x1%mpi_comm, request(2), ierr)
call MPI_Isend(Value_sub,1, ddtype_sendto_W , comm_1d_x1%west_rank, 222, comm_1d_x1%mpi_comm, request(3), ierr)
call MPI_Irecv(Value_sub,1, ddtype_recvfrom_E, comm_1d_x1%east_rank, 222, comm_1d_x1%mpi_comm, request(4), ierr)
call MPI_Waitall(4, request, MPI_STATUSES_IGNORE, ierr)
! Update the ghostcells in the y-direction using derived datatypes and subcommunicator.
call MPI_Isend(Value_sub,1, ddtype_sendto_N , comm_1d_x2%east_rank, 111, comm_1d_x2%mpi_comm, request(1), ierr)
call MPI_Irecv(Value_sub,1, ddtype_recvfrom_S, comm_1d_x2%west_rank, 111, comm_1d_x2%mpi_comm, request(2), ierr)
call MPI_Isend(Value_sub,1, ddtype_sendto_S , comm_1d_x2%west_rank, 222, comm_1d_x2%mpi_comm, request(3), ierr)
call MPI_Irecv(Value_sub,1, ddtype_recvfrom_N, comm_1d_x2%east_rank, 222, comm_1d_x2%mpi_comm, request(4), ierr)
call MPI_Waitall(4, request, MPI_STATUSES_IGNORE, ierr)
! Update the ghostcells in the z-direction using derived datatypes and subcommunicator.
call MPI_Isend(Value_sub,1, ddtype_sendto_F , comm_1d_x3%east_rank, 111, comm_1d_x3%mpi_comm, request(1), ierr)
call MPI_Irecv(Value_sub,1, ddtype_recvfrom_B, comm_1d_x3%west_rank, 111, comm_1d_x3%mpi_comm, request(2), ierr)
call MPI_Isend(Value_sub,1, ddtype_sendto_B , comm_1d_x3%west_rank, 222, comm_1d_x3%mpi_comm, request(3), ierr)
call MPI_Irecv(Value_sub,1, ddtype_recvfrom_F, comm_1d_x3%east_rank, 222, comm_1d_x3%mpi_comm, request(4), ierr)
call MPI_Waitall(4, request, MPI_STATUSES_IGNORE, ierr)
!$acc end host_data
end subroutine mpi_subdomain_ghostcell_update
..................
calc 'U' array (it is 3dimension array)
..................
line 779 !$acc compare(U)
call mpi_subdomain_ghostcell_update(U)
line 781 !$acc compare(U)
line 784 !$acc update host(U)
line 786 !$acc compare(U)
This is terminal output
PCAST Double u in function mod_flowarray__makefld, /home/jsera.lee/apply_openACC/src/mod_flowarray.f90:781
idx: 258 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 259 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 260 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 261 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 262 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 263 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 264 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 265 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 266 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 267 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
PCAST Double u in function mod_flowarray__makefld, /home/jsera.lee/apply_openACC/src/mod_flowarray.f90:784
idx: 258 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 259 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 260 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 261 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 262 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 263 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 264 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 265 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 266 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 267 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
PCAST Double u in function mod_flowarray__makefld, /home/jsera.lee/apply_openACC/src/mod_flowarray.f90:786
idx: 258 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 259 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 260 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 261 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 262 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 263 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 264 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 265 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 266 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
idx: 267 FAIL ABS act: 7.03125000000000000e-01 exp: 0.00000000000000000e+00 dif: 7.03125000000000000e-01
At first i really confused about this output. but i can find some clue about this.
If you have some computation or data movement outside of the control of OpenACC that affects only host or only device memory, you have to account for that when using PCAST. For instance, an MPI data transfer to host memory would have to be copied to the device. If you use CUDA-aware MPI and do an MPI data transfer directly between two GPUs, then the host memory values are stale and must be updated. Similarly, if you have any cuBLAS or cuSolver calls on the device, then the host memory values are stale. This is likely where you have the OpenACC host_data construct, which says that something in device memory is being processed.
https://developer.nvidia.com/blog/detecting-divergence-using-pcast-to-compare-gpu-to-cpu-results/
As you can see i used ā!$acc host_data use_deviceā because for CUDA-aware MPI or GPU-direct.
all information array is located GPU buffer so gpu to gpu communications are more efficient.
but I donāt know is it work properly. (This is not main question this topic)
Q. This is my Question.
line 779 !$acc compare(U) passed ā cpu/gpu buffer have same data.
MPI function use host_data use_device.
line 781 !$acc compare(U) detected difference ā I understand because gpu communication execution.
line 784 !$acc update host(U) detected difference ā I understand it is difference cpu<->gpu data.
but 1. memcpy device to host 2. compare for detect 3. update host data using device data (it is my opinion)
so line 786 !$acc compare(U) detected difference ā I canāt understand. line 784 already updated host data so i think it is not possible.
Could i ask what is wrong?