Hi,
I am getting an out of memory error on the GPU just before one of my GPU kernels is launched. To investigate more, I ran the code with PGI_ACC_DEBUG=1, and found that a memory allocation request for 8GB is made on the device. But I am not sure what variable. It seemed to finish moving an array named neighbours just before the error. Below is an excerpt:
...
pgi_uacc_dataon(devptr=0x60,hostptr=0x7fa1a11a6860,offset=0,0,stride=1,18486,size=18471x27,extent=18486x27,eltsize=4,lineno=404,name=neighbours,flags=0x2700=create+present+copyin+inexact,threadid=1)
pgi_uacc_dataon( devid=1, threadid=1 ) dindex=1
NO map for host:0x7fa1a11a6860
pgi_uacc_alloc(size=1996488,devid=1,threadid=1)
pgi_uacc_alloc(size=1996488,devid=1,threadid=1) returns 0xb02500000
map dev:0xb02500000 host:0x7fa1a11a6860 size:1996488 offset:0 data[dev:0xb02500000 host:0x7fa1a11a6860 size:1996488] (line:404 name:neighbours) dims=18486x27
alloc done with devptr at 0xb02500000
pgi_uacc_pin(devptr=0x0,hostptr=0x7fa1a11a6860,offset=0,0,stride=1,18486,size=18471x27,extent=18486x27,eltsize=4,lineno=404,name=neighbours,flags=0x0,threadid=1)
MemHostRegister( 0x7fa1a11a6860, 1996488, 0 )
pgi_uacc_dataupx(devptr=0xb02500000,hostptr=0x7fa1a11a6860,offset=0,0,stride=1,18486,size=18471x27,extent=18486x27,eltsize=4,lineno=404,name=neighbours,flags=0x0,threadid=1)
pgi_uacc_cuda_dataup2(devdst=0xb02500000,hostsrc=0x7fa1a11a6860,offset=0,0,stride=1,18486,size=18471,27,eltsize=4,lineno=404,name=neighbours)
pgi_uacc_datadone( async=-1, devid=1 )
pgi_uacc_cuda_wait(lineno=-1,async=-1,dindex=1)
pgi_uacc_cuda_wait(sync on stream=(nil))
pgi_uacc_cuda_wait done
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb0246c600
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02700000
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb0276c400
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02800000
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb0286c400
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02900000
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb0296c400
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02a00000
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02a6c400
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02b00000
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02b6c400
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02c00000
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02c6c400
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02d00000
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02d6c400
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02e00000
pgi_uacc_alloc(size=443304,devid=1,threadid=1)
pgi_uacc_alloc(size=443304,devid=1,threadid=1) returns 0xb02e6c400
pgi_uacc_alloc(size=8194917744,devid=1,threadid=1)
call to cuMemAlloc returned error 2: Out of memory
P0
I ran it step by step using cuda-gdb. I got the memory error again, but no new message that shed more light on the cause of the 8GB allocation. Below is an excerpt from cuda-gdb:
[Launch of CUDA Kernel 1 (calc_force_des_150_gpu<<<(18471,1,1),(256,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 2 (calc_force_des_180_gpu_red<<<(1,1,1),(256,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 1 (calc_force_des_150_gpu<<<(18471,1,1),(256,1,1)>>>) on Device 0]
Breakpoint 1, calc_force_des () at calc_force_des.f:404
404 !$acc parallel
(cuda-gdb) s
__pgi_uacc_dataon (filename=0xae5560 "/lvol/home/anirban/mfix/mfix.0011b/model/./des/calc_force_des.f",
funcname=0xae55a0 "calc_force_des", pdevptr=0x7fffffffd920, hostptr=0x7fffefd96020, dims=2, desc=0x7fffffffd310, elementsize=8,
lineno=404, name=0xae59d7 "des_pos_new", flags=9984, async=-1, devid=1) at dataon.c:39
39 dataon.c: No such file or directory.
in dataon.c
(cuda-gdb) s
43 in dataon.c
(cuda-gdb) s
48 in dataon.c
(cuda-gdb) s
50 in dataon.c
(cuda-gdb) s
52 in dataon.c
(cuda-gdb)
54 in dataon.c
(cuda-gdb)
55 in dataon.c
(cuda-gdb)
56 in dataon.c
(cuda-gdb)
57 in dataon.c
(cuda-gdb)
64 in dataon.c
(cuda-gdb)
66 in dataon.c
(cuda-gdb)
69 in dataon.c
(cuda-gdb)
74 in dataon.c
(cuda-gdb)
77 in dataon.c
(cuda-gdb)
__pgi_uacc_adjust (pdims=0x7fffffffcebc, desc=0x7fffffffd310) at adjust.c:31
31 adjust.c: No such file or directory.
in adjust.c
(cuda-gdb)
32 in adjust.c
(cuda-gdb)
34 in adjust.c
(cuda-gdb)
36 in adjust.c
(cuda-gdb)
37 in adjust.c
(cuda-gdb)
42 in adjust.c
(cuda-gdb)
43 in adjust.c
(cuda-gdb)
44 in adjust.c
(cuda-gdb)
50 in adjust.c
(cuda-gdb)
56 in adjust.c
(cuda-gdb)
34 in adjust.c
(cuda-gdb)
36 in adjust.c
(cuda-gdb)
37 in adjust.c
(cuda-gdb)
42 in adjust.c
(cuda-gdb)
43 in adjust.c
(cuda-gdb)
44 in adjust.c
(cuda-gdb)
50 in adjust.c
(cuda-gdb)
56 in adjust.c
(cuda-gdb)
34 in adjust.c
(cuda-gdb)
67 in adjust.c
(cuda-gdb)
68 in adjust.c
(cuda-gdb)
78 in adjust.c
(cuda-gdb)
82 in adjust.c
(cuda-gdb)
84 in adjust.c
(cuda-gdb)
85 in adjust.c
(cuda-gdb)
86 in adjust.c
(cuda-gdb)
87 in adjust.c
(cuda-gdb)
67 in adjust.c
(cuda-gdb)
92 in adjust.c
(cuda-gdb)
93 in adjust.c
(cuda-gdb)
94 in adjust.c
(cuda-gdb)
__pgi_uacc_dataon (filename=0xae5560 "/lvol/home/anirban/mfix/mfix.0011b/model/./des/calc_force_des.f",
funcname=0xae55a0 "calc_force_des", pdevptr=0x7fffffffd920, hostptr=0x7fffefd96020, dims=1, desc=0x7fffffffd310, elementsize=8,
lineno=404, name=0xae59d7 "des_pos_new", flags=9984, async=-1, devid=1) at dataon.c:78
78 dataon.c: No such file or directory.
in dataon.c
(cuda-gdb)
87 in dataon.c
(cuda-gdb)
88 in dataon.c
(cuda-gdb)
92 in dataon.c
(cuda-gdb)
Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7fffe8943700 (LWP 14598)]
0x00000036fe8de2f3 in select () from /lib64/libc.so.6
(cuda-gdb)
Single stepping until exit from function select,
which has no line number information.
warning: Cuda API error detected: cuMemAlloc_v2 returned (0x2)
call to cuMemAlloc returned error 2: Out of memory
[Thread 0x7fffe8943700 (LWP 14598) exited]
Program exited with code 01.
[Termination of CUDA Kernel 2 (calc_force_des_180_gpu_red<<<(1,1,1),(256,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 0 (desgrid_neigh_build_gpu_507_gpu<<<(145,1,1),(128,1,1)>>>) on Device 0]
(cuda-gdb)
The program is not being run.
[Launch of CUDA Kernel 1 (calc_force_des_150_gpu<<<(18471,1,1),(256,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 2 (calc_force_des_180_gpu_red<<<(1,1,1),(256,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 1 (calc_force_des_150_gpu<<<(18471,1,1),(256,1,1)>>>) on Device 0]
Breakpoint 1, calc_force_des () at calc_force_des.f:404
404 !$acc parallel
(cuda-gdb) s
__pgi_uacc_dataon (filename=0xae5560 "/lvol/home/anirban/mfix/mfix.0011b/model/./des/calc_force_des.f",
funcname=0xae55a0 "calc_force_des", pdevptr=0x7fffffffd920, hostptr=0x7fffefd96020, dims=2, desc=0x7fffffffd310, elementsize=8,
lineno=404, name=0xae59d7 "des_pos_new", flags=9984, async=-1, devid=1) at dataon.c:39
39 dataon.c: No such file or directory.
in dataon.c
(cuda-gdb) s
43 in dataon.c
(cuda-gdb) s
48 in dataon.c
(cuda-gdb) s
50 in dataon.c
(cuda-gdb) s
52 in dataon.c
(cuda-gdb)
54 in dataon.c
(cuda-gdb)
55 in dataon.c
(cuda-gdb)
56 in dataon.c
(cuda-gdb)
57 in dataon.c
(cuda-gdb)
64 in dataon.c
(cuda-gdb)
66 in dataon.c
(cuda-gdb)
69 in dataon.c
(cuda-gdb)
74 in dataon.c
(cuda-gdb)
77 in dataon.c
(cuda-gdb)
__pgi_uacc_adjust (pdims=0x7fffffffcebc, desc=0x7fffffffd310) at adjust.c:31
31 adjust.c: No such file or directory.
in adjust.c
(cuda-gdb)
32 in adjust.c
(cuda-gdb)
34 in adjust.c
(cuda-gdb)
36 in adjust.c
(cuda-gdb)
37 in adjust.c
(cuda-gdb)
42 in adjust.c
(cuda-gdb)
43 in adjust.c
(cuda-gdb)
44 in adjust.c
(cuda-gdb)
50 in adjust.c
(cuda-gdb)
56 in adjust.c
(cuda-gdb)
34 in adjust.c
(cuda-gdb)
36 in adjust.c
(cuda-gdb)
37 in adjust.c
(cuda-gdb)
42 in adjust.c
(cuda-gdb)
43 in adjust.c
(cuda-gdb)
44 in adjust.c
(cuda-gdb)
50 in adjust.c
(cuda-gdb)
56 in adjust.c
(cuda-gdb)
34 in adjust.c
(cuda-gdb)
67 in adjust.c
(cuda-gdb)
68 in adjust.c
(cuda-gdb)
78 in adjust.c
(cuda-gdb)
82 in adjust.c
(cuda-gdb)
84 in adjust.c
(cuda-gdb)
85 in adjust.c
(cuda-gdb)
86 in adjust.c
(cuda-gdb)
87 in adjust.c
(cuda-gdb)
67 in adjust.c
(cuda-gdb)
92 in adjust.c
(cuda-gdb)
93 in adjust.c
(cuda-gdb)
94 in adjust.c
(cuda-gdb)
__pgi_uacc_dataon (filename=0xae5560 "/lvol/home/anirban/mfix/mfix.0011b/model/./des/calc_force_des.f",
funcname=0xae55a0 "calc_force_des", pdevptr=0x7fffffffd920, hostptr=0x7fffefd96020, dims=1, desc=0x7fffffffd310, elementsize=8,
lineno=404, name=0xae59d7 "des_pos_new", flags=9984, async=-1, devid=1) at dataon.c:78
78 dataon.c: No such file or directory.
in dataon.c
(cuda-gdb)
87 in dataon.c
(cuda-gdb)
88 in dataon.c
(cuda-gdb)
92 in dataon.c
(cuda-gdb)
Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7fffe8943700 (LWP 14598)]
0x00000036fe8de2f3 in select () from /lib64/libc.so.6
(cuda-gdb)
Single stepping until exit from function select,
which has no line number information.
warning: Cuda API error detected: cuMemAlloc_v2 returned (0x2)
call to cuMemAlloc returned error 2: Out of memory
[Thread 0x7fffe8943700 (LWP 14598) exited]
Program exited with code 01.
[Termination of CUDA Kernel 2 (calc_force_des_180_gpu_red<<<(1,1,1),(256,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 0 (desgrid_neigh_build_gpu_507_gpu<<<(145,1,1),(128,1,1)>>>) on Device 0]
(cuda-gdb)
The program is not being run.
Finally, running the code with cuda-memcheck causes a hang. “ps x” shows the foll:
...
14455 pts/0 S+ 0:00 cuda-memcheck mfix.exe
14456 pts/0 Rl+ 14:17 mfix.exe
...
Any advice on how to proceed to fix this will be great.
Thanks much
Anirban