transposed matrix of size N*M

hi people,

i got a question related to working with a 2D array.

im trying to view the original matrix as a transposed one in order to run simple calculations related to the data stored in the matrix.

so far i got it working for simple NN matrix, but as soon as i try to port my solution on to working with matrices of size NM i get wrong results.

im posting my working code:

__global__ void runMethod (const int *input,

                                 int *output,

                                 int width,

                                 int height)


 int out_val = 0;

 int x_index = blockIdx.x * blockDim.x + threadIdx.x; // column indexes

 int y_index = blockIdx.y * blockDim.y + threadIdx.y; // row indexes

for (int i = 0; e < width; e++)


  out_val += input[e+x_index*width] + input[e+y_index*height]; //sum up all element in rows and columns

  output[x_index + y_index*height] = out_val; //store it in the intersecting point in matrix



technically my code is somewhat of a modified way of simple matrix multiplication without use of shared memory.

now my question is what must i do in order to view my original matrix as a transposed one with the use of indexes only; and where should i store these after?

one of ideas i had was to adapt the code from the transposition of matrices by defining two more indexes (ie index_original and index_transpose)

such as:

int index_original = x_index + width * y_index;

int index_transpose = y_index + height * x_index;

and then go step by step with a simple for loop inside a kernel in order to retrieve values stored in input[index_original] and input[index_transposed] and then store them in output[index_in + index_out * height];

any pseudo code or ideas on how to sum M(in) with M(T(in)) would help

thanks in advance!!