Hello, I’m trying to deploy a matrix inversion code, however I’m having a problem I do not know how to solve it.The article I’m using is called “Gauss-Jordan Matrix Inversion Speed-Up using GPUswith the Consideration of Power Consumption”

It consists of taking an n × n matrix, extending it to n x 2n, where the left half is the input matrix and the right side is the identity matrix.

I was able to implement code that works for matrix with n <= 1022, but for larger matrix the process does not work.

The code below shows how I’m allocating the threads and calling the kernel function.

```
int N = (n / 1024) + 1;
dim3 dimGrid((n / 1024) + 1, n, 1);
for(int i = 0; i < n; i++){
fixAll<< <dimGrid, (n+2)/N>> >(d_A, n, i);
}
```

The code below shows the kernel function.

**global** void fixAll (float *Matrix, int n, int collId){

int T = blockIdx.x * blockDim.x + threadIdx.x;

```
if (T + collId < n*2){
int TIndex = threadIdx.x;
int B = blockIdx.y;
float D = Matrix[collId*(n*2)+collId];
__shared__ float SharedRow[1024];
SharedRow[TIndex] = Matrix[collId*(n*2) + collId + T];
float C;
C = Matrix[B * (n*2) + collId] / D;
if(B == collId){
Matrix[B * (n*2) + T + collId] /= D;
}else{
Matrix[B * (n*2) + T + collId] -= C * SharedRow[TIndex];
}
}
```

}

What happens is that the kernel only returns “nan” as a response, not the inverse matrix.Doing some tests I realized until the line “Matrix[B * (n*2) + T + collId] /= D;” the code works, but, in the line “Matrix[B * (n*2) + T + collId] -= C * SharedRow[TIndex];” is where the mistake happens.

Doing some more tests I think the problem is in variable C, but I do not understand what might be wrong when affecting the result, since the same code works with dimension matrix n <= 1022.

Please, can anyone help me understand what’s going on? Thank you very much in advance.

Sorry for the bad English, I used google translate.

This is the link if you want to download the article