OpenACC Fortran
(Please do not suggest using CUDA Fortran. I am working on a huge legacy code. OpenACC is the simplest way to port it to GPU)
This is a follow-up of the previous problem. I am trying to do a series of matrix inversions in a loop. The issue is that the number of iteration “niter” can only be quite small. If it is set such as 1000, you can see the returned message when executed
call to cuMemHostUnregister returned error 700: Launch failed
You should not try to change
!$acc parallel
!$acc loop private(x6,b6)
to
!$acc parallel private(x6,b6)
!$acc loop
Otherwise, the computing results will be incorrect, as there exist the race condition.
The source code is posted below
program matrixinverse
implicit real*8 (a-h,o-z)
real*8 amatr(6,6,10000), x6(6,6), b6(6)
niter = 10000
n = 6
amatr = 0.0d0
do ie = 1, niter
do j = 1, n
amatr(j,j,ie) = dble(j)
enddo
enddo
!$acc parallel
!$acc loop private(x6,b6)
do ie = 1, niter
!
! LU decomposition by point
!
do k = 1, n
!
! decompose the diagonal terms
!
do m = 1, k-1
!
amatr(k,k,ie) = amatr(k,k,ie) - amatr(k,m,ie)*amatr(m,k,ie)
!
enddo
!
adicv = 1.0d0/amatr(k,k,ie)
!
! decompose the non diagonal terms
!
do i = k+1, n
!
do m = 1, k-1
!
amatr(k,i,ie) = amatr(k,i,ie) - amatr(k,m,ie)*amatr(m,i,ie)
amatr(i,k,ie) = amatr(i,k,ie) - amatr(i,m,ie)*amatr(m,k,ie)
!
enddo
!
amatr(i,k,ie) = amatr(i,k,ie)*adicv
!
enddo
!
enddo
do i = 1, n
!
do j = 1, n
!
b6(j) = 0.0d0
!
enddo
!
b6(i) = 1.0d0
!
! LU resolution
!
do ii = 1, n
!
c = 0.0d0
!
do jj = 1, ii-1
!
c = c + amatr(ii,jj,ie)*x6(jj,i)
!
enddo
!
x6(ii,i) = b6(ii) - c
!
enddo
!
do ii = n, 1, -1
!
c = 0.0d0
!
do jj = ii+1, n
!
c = c + amatr(ii,jj,ie)*x6(jj,i)
!
enddo
!
x6(ii,i) = (x6(ii,i) - c)/amatr(ii,ii,ie)
!
enddo
!
enddo
!
! this is to test if the private array returns correct values
!
amatr(:,:,ie) = x6(:,:)
!
enddo
!$acc end parallel
!x$acc end region
!x$acc end data region
do ie = 1, niter
write(*,*) ie
do i = 1, n
write(*,"(6f15.5)") (amatr(i,j,ie), j=1,n)
enddo
enddo
end
If “niter” is small enough, you will see the screen output in loop. The result should be uniformly the diagonal matrix
1, 1/2, 1/3, 1/4, 1/5, 1/6, as posted below
“No. of iteration”
1.00000 0.00000 0.00000 0.00000 0.00000 0.00000
0.00000 0.50000 0.00000 0.00000 0.00000 0.00000
0.00000 0.00000 0.33333 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.25000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.20000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 0.16667