Hi everyone!

I am currently exploring how to use three dimensional matrices in the kernel but it is not as simple as I thought. According to the CUDA Fortran

Programming Guide and Reference the values of blockidx%z and griddim%z must always be one. Keeping constraints in mind, I know a block cannot have more than 512 threads. If the device will only accept two dimensional matrices, what would be the best way to take a three dimensional matrix and send it to the kernel?

This is the code I am trying to implement:

```
do m = 1, mba
do j = 2, nj-1
do i = 2, ni-1
PHIN(i, j, m) = AN(i,j,m) * PHI(i,j+1,m)&
+ AS(i,j,m) * PHI(i,j-1,m)&
+ AE(i,j,m) * PHI(i,j+1,m)&
+ AW(i,j,m) * PHI(i,j-1,m)&
+ AP(i,j,m) * PHI(i,j+1,m)
enddo
enddo
enddo
```

Any feedback would be helpful since I am new to CUDA Fortran. Thank you for your time!

-Chris