I was trying to perform linear algebraic algorithms in the OpenACC kernel loop, e.g., matrix inversion, (n=6, ntotal=1000)

```
real*8 a(n, n, ntotal), b(n,n), c(n)
!$acc region
!$acc loop kernel independent private(b, c)
do i = 1, ntotal
...
get the inverse of matrix a(:, :, i) and store it in a(:, :, i),
where b and c are necessary auxiliary local arrays.
...
enddo
!$acc end region
```

However, when ntotal is large, e.g., ntotal > 200, I got the error message like below:

**call to cuMemFree returned error 700: Launch failed**

I think that the compiler specifies memory for the “private” b and c arrays like n*n*ntotal, and n*ntotal to make them private enough. But this account too much memory.

Is there anyone also working on matrix linear algebra that requires local matrices? I guess this should be a common issue if treated in a naive manner like I did. I am keen to know how to get over this issue.

Any comment is greatly welcome!