I’ve NxN Matrix.

I should write a CUDA kernel that perform operations on all diagonals of the matrix, in such a way that the calculation on the diagonal n + 1 occurs after that of the n diagonal.

Es.:

compute on m[1][1]

cudaSyncronyze();

compute on m[2][1],m[1][2]

cudaSyncronize();

compute on m[3][1],m[2][2],m[1][3]

cudaSyncronize();

compute on m[4][1],m[3][2],m[2][3],m[1][4]

cudaSyncronize();

…and so on.

I confused about how to write the code. Who can help me?