Accessing last element first

For the given loop how can i handle the
threads and blocks.

for(k=n; k>=1; k–)
{
for(l=k; l>=1; l–)
{
}

as theadid and blockId start from 0(zero).
How can we move in any matrix from last
Index i.e A[n][n] to first Index i.e A[0][0]
in cuda.
I have to access the item in decensing index no order i.e A[n-1],A[n-2]…A[0]…
plz help