Dear support,
I added a simple ACC region on the top of a single DO loop. Apparently everything should work easy.
The code is:
!$acc region copyin(aux, eigts1, eigts2, eigts3, mill, g) copyout(aux1)
do ig = 1, ngm
cfac = aux (ig, is) * &
CONJG( eigts1 (mill (1,ig), na) * &
eigts2 (mill (2,ig), na) * &
eigts3 (mill (3,ig), na) )
aux1 (ig) = cfac * g (jpol, ig)
enddo
!$acc end region
“is” and “jpol” are indexes that come from outer loops. “aux1” is used just after the ACC region so I put it in the copyout clause. It does not require any initialization.
The -Minfo output reports:
108, Generating copyout(aux1(:))
Generating copyin(g(jpol,1:ngm))
Generating copyin(mill(1:3,1:ngm))
Generating copyin(eigts3(:,:))
Generating copyin(eigts2(:,:))
Generating copyin(eigts1(:,:))
Generating copyin(aux(:,:))
Generating compute capability 2.0 binary
109, Loop is parallelizable
Accelerator kernel generated
109, !$acc do parallel, vector(32) ! blockidx%x threadidx%x
Non-stride-1 accesses for array ‘g’
Non-stride-1 accesses for array ‘mill’
CC 2.0 : 21 registers; 4 shared, 208 constant, 0 local memory bytes; 16% occupancy
(occupancy is low but well… I am more interested to get OpwnACC working on that specific point now :-P)
And, after the core is generated, this is the point where I get the error.
(gdb) bt
#0 0x0000003513487fc6 in __memcpy_sse2 () from /lib64/libc.so.6
#1 0x00007f313104b691 in ?? () from /usr/lib64/libcuda.so.1
#2 0x00007f31310557b3 in ?? () from /usr/lib64/libcuda.so.1
#3 0x00007f3131055d8c in ?? () from /usr/lib64/libcuda.so.1
#4 0x00007f313104d54e in ?? () from /usr/lib64/libcuda.so.1
#5 0x00007f313102d6b7 in ?? () from /usr/lib64/libcuda.so.1
#6 0x00007f31310300ad in ?? () from /usr/lib64/libcuda.so.1
#7 0x00007f3131020923 in ?? () from /usr/lib64/libcuda.so.1
#8 0x0000000000877ba3 in __pgi_cu_upload2 (devptr=13865189376, hostptr=0xcb4d98, devx=0, devy=0, hostx=0, hosty=0, size1=1, size2=82835, devstride2=1,
hoststride1=1, hoststride2=3, elementsize=8, lineno=108, name=0xba335c “g$p”) at …/src-nv/nvupload2.c:82
#9 0x0000000000873d2d in __pgi_cu_uploadx_seq (devptr=13865189376, hostptr=0xcb4d98, dims=2, desc=0x7fff50988f20, elementsize=8, lineno=108,
name=0xba335c “g$p”) at …/src-nv/nvuploadx.c:236
#10 0x0000000000875661 in pgi_cu_uploadxx_p (devptr=13865189376, hostptr=0xcb4d98, dims=2, desc=0x7fff50988f20, elementsize=8, lineno=108,
name=0xba335c “g$p”, eventinfo=0xbf1190) at …/src-nv/nvuploadx.c:649
#11 0x0000000000875924 in pgi_cu_uploadx_a_p (devptr=13865189376, hostptr=0xcb4d98, dims=2, desc=0x7fff50988f20, elementsize=8, lineno=108,
name=0xba335c “g$p”, flags=0, async=0) at …/src-nv/nvuploadx.c:705
#12 0x00000000005e0cb7 in addusstres.pgi.uni.gpu (sigmanlc=…) at ./addusstress.F90:108
#13 0x00000000005caae1 in stres_knl.pgi.uni.istanbul (sigmanlc=…, sigmakin=…) at ./stres_knl.F90:90
#14 0x00000000004ae69e in stress.pgi.uni.istanbul (sigma=…) at ./stress.F90:116
#15 0x000000000041c32c in pwscf.pgi.uni.istanbul () at ./pwscf.F90:119
I think I put the ACC region directive in the right place with the right clauses. I do nto see any obstacle inside the loop, CONJG should be supported (I am using PGI 12.2). Is it possible that the program crash at that point due to “not enough memory available”? If yes, how detect and eventually apply a recovery strategy in the code?
Many thanks in advance!
F.