I was trying to perform linear algebraic algorithms in the OpenACC kernel loop, e.g., matrix inversion, (n=6, ntotal=1000)
real*8 a(n, n, ntotal), b(n,n), c(n) !$acc region !$acc loop kernel independent private(b, c) do i = 1, ntotal ... get the inverse of matrix a(:, :, i) and store it in a(:, :, i), where b and c are necessary auxiliary local arrays. ... enddo !$acc end region
However, when ntotal is large, e.g., ntotal > 200, I got the error message like below:
call to cuMemFree returned error 700: Launch failed
I think that the compiler specifies memory for the “private” b and c arrays like nnntotal, and n*ntotal to make them private enough. But this account too much memory.
Is there anyone also working on matrix linear algebra that requires local matrices? I guess this should be a common issue if treated in a naive manner like I did. I am keen to know how to get over this issue.
Any comment is greatly welcome!