I have distilled a code with basically does nothing but passes values. Please take a look and try to compile and run it. You may get the error message.
call to cuMemFree returned error 700: Launch failed
There should be no memory bounds error in the host version, as I checked. If you happen to know how it occurs, please let me know. I am keen to overcome this issue. Thanks a lot in advance!
program inversematrix implicit real*8 (a-h,o-z) real*8 a(6,6,10000) real*8 c(6,6), L(6,6), U(6,6), b(6), d(6), x(6) niter = 10000 n = 6 a = 0.0d0 do ie = 1, niter do i = 1, n a(i,i,ie) = 1.0d0 enddo enddo !$acc data region !$acc region !$acc loop kernel independent private(c,L,U,b,d,x) do ie = 1, niter c(:,:)=a(:,:,ie) L=c U=L b(:) = U(:,1) d=b x=d a(:,:,ie)=L(:,:) enddo !$acc end region !$acc end data region end program inversematrix
pgf90 inverse.f90 -o run -r8 -O2 -g -traceback -pg -acc -ta=nvidia,flushz,time,cc20,keepgpu,keepptx -Mcuda=ptxinfo -Minfo=accel