sum of each column of the 2D matrix

I have the following matrix:

1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9

===》 9 18 27 36 45 54 63 72 81

How could I sum up each column of the matrix in the kernel function? This matrix is stored in the 2D block. Thanks!

How many columns does your real matrix have? Hundreds? Thousands?

about 1024

You might find the code in this post useful.

thank you for your replying.

If I simply design a kernel which gets the sum of each column, I would do that. My problem is that I have finished some other computation ahead and move on to the last step which I need to sum up the column of each column. And another thing I have to mention is that my block is 2-D. I use following code to locate the index of the elements in the matrix.

int x = threadIdx.x + blockIdx.x*BLOCK_SIZE_X;

 int y = threadIdx.y + blockIdx.y*BLOCK_SIZE_Y;