Hi Feng,
Two issues. First, the triplet notation issue with the update directive as noted in your other post. Second, we don’t support sub-components of aggregate types in “host_data” so you’ll want to use a temp pointer. For example:
pickle_t sptr = sbuf[ia][id];
#pragma acc host_data use_device (sptr)
{
MPI_Isend( sptr,ssize,MPI_BYTE,1,tag[ia][id],MPI_COMM_WORLD,&(sreq[ia][id]));
}
Full changes in the attached file:
main.cpp (6.6 KB)
% mpicxx -acc main.cpp -Minfo=accel -cuda -V22.3
compute(int, int, double *):
51, Generating present(sdata[:nq*nv])
Generating NVIDIA GPU code
51, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
main:
101, Generating enter data copyin(sbuf[:na])
106, Generating enter data copyin(sbuf->[:ncpu])
110, Generating enter data copyin(sbuf->->[:ssize])
116, Generating enter data copyin(sdata[:nq*nv])
125, Generating update self(sbuf[ia][id][:ssize])
166, Generating enter data copyin(rbuf[:na])
171, Generating enter data copyin(rbuf->[:ncpu])
175, Generating enter data copyin(rbuf->->[:rsize])
198, Generating update self(rbuf[ia][id][:rsize])
214, Generating exit data delete(sbuf->->[:ssize])
216, Generating exit data delete(sbuf->[:ncpu])
218, Generating exit data delete(sbuf[:na])
226, Generating exit data delete(rbuf->->[:ssize])
228, Generating exit data delete(rbuf->[:ncpu])
230, Generating exit data delete(rbuf[:na])
void pcklegpu<double>(unsigned long *, int, int, const T1 *, char *):
25, Generating enter data create(tmp[:nq*nv])
Generating present(var[:nq*nv],tmp[:nq*nv])
Generating NVIDIA GPU code
25, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
35, Generating exit data delete(tmp[:nq*nv])
% mpirun -np 2 ./a.out
Num GPUs: 8
rank 1 use GPU 1
Num GPUs: 8
rank 0 use GPU 0
-0 check sbuf before sending====
-10000 check sbuf before sending====
-20000 check sbuf before sending====
-30000 check sbuf before sending====
-40000 check sbuf before sending====
-0 check sbuf before sending====
-10000 check sbuf before sending====
-20000 check sbuf before sending====
-30000 check sbuf before sending====
-40000 check sbuf before sending====
-0 check sbuf before sending====
-10000 check sbuf before sending====
-20000 check sbuf before sending====
-30000 check sbuf before sending====
-40000 check sbuf before sending====
-0 check sbuf before sending====
-10000 check sbuf before sending====
-20000 check sbuf before sending====
-30000 check sbuf before sending====
-40000 check sbuf before sending====
-0 check sbuf before sending====
-10000 check sbuf before sending====
-20000 check sbuf before sending====
-30000 check sbuf before sending====
-40000 check sbuf before sending====
-0 check sbuf before sending====
-10000 check sbuf before sending====
-20000 check sbuf before sending====
-30000 check sbuf before sending====
-40000 check sbuf before sending====
rank 0 send message with size 40
rank 0 send message with size 40
rank 0 send message with size 40
----------------finish sending message-----------------------
rank 1 receive message with size 40
-0 check rbuf ====
-10000 check rbuf ====
-20000 check rbuf ====
-30000 check rbuf ====
-40000 check rbuf ====
rank 1 receive message with size 40
-0 check rbuf ====
-10000 check rbuf ====
-20000 check rbuf ====
-30000 check rbuf ====
-40000 check rbuf ====
rank 1 receive message with size 40
-0 check rbuf ====
-10000 check rbuf ====
-20000 check rbuf ====
-30000 check rbuf ====
-40000 check rbuf ====
rank 1 receive message with size 40
-0 check rbuf ====
-10000 check rbuf ====
-20000 check rbuf ====
-30000 check rbuf ====
-40000 check rbuf ====
rank 1 receive message with size 40
-0 check rbuf ====
-10000 check rbuf ====
-20000 check rbuf ====
-30000 check rbuf ====
-40000 check rbuf ====
rank 1 receive message with size 40
-0 check rbuf ====
-10000 check rbuf ====
-20000 check rbuf ====
-30000 check rbuf ====
-40000 check rbuf ====