Starting Accel. Fortran

waku2005 · February 15, 2011, 11:29pm

Dear all;

I’ve just started accelerated fortran after reading below contents
with downloadable sample program (tared file)

I succeseeded to run the first and second sample program, but failed the 3rd (last) one.
I’ll appriciate to some comments.

My environment:
CentOS 5.5 x86_64
PGI Accel. WS for linux (PGI2011.02)
ELSA quadro 5000
CUDA 3.2 TK and driver from NVIDIA site
(pgaccelinfo and devicequery seems to be fine for the device)

Build and error logs:
[waku@ensis10 pgi]$ pgfortran -ta=nvidia,cc20,time -Minfo pgi_test_3.f90
smooth:
10, Generating copyout(a(2:n-1,2:m-1))
Generating copyin(b(1:n,1:m))
Generating copyout(b(2:n-1,2:m-1))
Generating compute capability 2.0 binary
11, Loop carried dependence due to exposed use of ‘b(1:n,1:m)’ prevents parallelization
Parallelization would require privatization of array ‘a(i2+2,2:m-1)’
Sequential loop scheduled on host
13, Loop is parallelizable
14, Loop is parallelizable
Accelerator kernel generated
13, !$acc do parallel, vector(16) ! blockidx%x threadidx%x
Cached references to size [18x18] block of ‘b’
14, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
CC 2.0 : 25 registers; 1304 shared, 88 constant, 0 local memory bytes; 66% occupancy
21, Loop is parallelizable
22, Loop is parallelizable
Accelerator kernel generated
21, !$acc do parallel, vector(16) ! blockidx%x threadidx%x
22, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
CC 2.0 : 13 registers; 8 shared, 80 constant, 0 local memory bytes; 100% occupancy
[waku@ensis10 pgi]$ ./a.out
launch kernel file=/ssd/cuda/cudaf/pgi/pgi_test_3.f90 function=smooth line=14 device=0 grid=7x7 block=16x16
launch kernel file=/ssd/cuda/cudaf/pgi/pgi_test_3.f90 function=smooth line=22 device=0 grid=7x7 block=16x16
launch kernel file=/ssd/cuda/cudaf/pgi/pgi_test_3.f90 function=smooth line=14 device=0 grid=7x7 block=16x16
launch kernel file=/ssd/cuda/cudaf/pgi/pgi_test_3.f90 function=smooth line=22 device=0 grid=7x7 block=16x16
launch kernel file=/ssd/cuda/cudaf/pgi/pgi_test_3.f90 function=smooth line=14 device=0 grid=7x7 block=16x16
launch kernel file=/ssd/cuda/cudaf/pgi/pgi_test_3.f90 function=smooth line=22 device=0 grid=7x7 block=16x16
launch kernel file=/ssd/cuda/cudaf/pgi/pgi_test_3.f90 function=smooth line=14 device=0 grid=7x7 block=16x16
launch kernel file=/ssd/cuda/cudaf/pgi/pgi_test_3.f90 function=smooth line=22 device=0 grid=7x7 block=16x16
launch kernel file=/ssd/cuda/cudaf/pgi/pgi_test_3.f90 function=smooth line=14 device=0 grid=7x7 block=16x16
launch kernel file=/ssd/cuda/cudaf/pgi/pgi_test_3.f90 function=smooth line=22 device=0 grid=7x7 block=16x16
call to cuMemcpy2D returned error 1: Invalid value
CUDA driver version: 3020

Accelerator Kernel Timing data
/ssd/cuda/cudaf/pgi/pgi_test_3.f90
smooth
10: region entered 1 time
time(us): init=3386573
data=26
14: kernel launched 5 times
grid: [7x7] block: [16x16]
time(us): total=217 max=141 min=19 avg=43
22: kernel launched 5 times
grid: [7x7] block: [16x16]
time(us): total=68 max=15 min=13 avg=13
[waku@ensis10 pgi]$

Sincerely,
waku2005

MatColgrove · February 16, 2011, 9:15pm

Hi waku2005,

Sorry about this, it appears that we missed this problem. It’s new in 11.2 and only occurs on 64-bit systems running the latest CUDA drivers. The error is being caused by new supported we added for large memory (> 4GB) Fermi cards.

We have a fix being tested now and will release version 11.2-1 here in a few days.

Thanks,
Mat

waku2005 · February 17, 2011, 7:52am

Dear Mat,

Thank you for your reply and I’ll wait the update. :-)

waku2005