hi people,

i got a question related to working with a 2D array.

im trying to view the original matrix as a transposed one in order to run simple calculations related to the data stored in the matrix.

so far i got it working for simple N*N matrix, but as soon as i try to port my solution on to working with matrices of size N*M i get wrong results.

im posting my working code:

```
__global__ void runMethod (const int *input,
int *output,
int width,
int height)
{
int out_val = 0;
int x_index = blockIdx.x * blockDim.x + threadIdx.x; // column indexes
int y_index = blockIdx.y * blockDim.y + threadIdx.y; // row indexes
for (int i = 0; e < width; e++)
{
out_val += input[e+x_index*width] + input[e+y_index*height]; //sum up all element in rows and columns
output[x_index + y_index*height] = out_val; //store it in the intersecting point in matrix
}
}
```

technically my code is somewhat of a modified way of simple matrix multiplication without use of shared memory.

now my question is what must i do in order to view my original matrix as a transposed one with the use of indexes only; and where should i store these after?

one of ideas i had was to adapt the code from the transposition of matrices by defining two more indexes (ie index_original and index_transpose)

such as:

```
int index_original = x_index + width * y_index;
int index_transpose = y_index + height * x_index;
```

and then go step by step with a simple for loop inside a kernel in order to retrieve values stored in input[index_original] and input[index_transposed] and then store them in output[index_in + index_out * height];

any pseudo code or ideas on how to sum M(in) with M(T(in)) would help

thanks in advance!!