Hi !

I implemented an LU factorization using CUBLAS functions (but not fully parallelized algorithm) doing :

```
for (int k=0; k<mDimR-1; k++)
{
// pivot = max L, k+1..mDimR
int pivotRow = cublasIsamax(mDimR-k, L.getGpuDataPointer()+k + k*mDimR, 1); // row relative to the current submatrix
pivotRow = pivotRow+k-1;
if (pivotRow!=k)
{
cublasSswap(mDimC, L.getGpuDataPointer()+pivotRow, mDimR, L.getGpuDataPointer()+k, mDimR);
cublasSswap(mDimC, permute.getGpuDataPointer()+pivotRow, 1, permute.getGpuDataPointer()+k, 1);
}
float valcheck;
cublasGetVector(1,sizeof(float), L.getGpuDataPointer()+k+ k*mDimR, 1, &valcheck, 1);
if (fabs(valcheck) < 1E-20)
{
cout << "This matrix is not inversible or too ill conditionned" << endl;
return;
}
cublasSscal(mDimR-k-1, 1./valcheck, L.getGpuDataPointer()+k+1+ k*mDimR, 1);
cublasSger (mDimR-k-1, mDimC-k-1, -1., L.getGpuDataPointer()+k+1+ k*mDimR, 1, L.getGpuDataPointer()+k+ (k+1)*mDimR, mDimR, L.getGpuDataPointer()+(k+1)*mDimR+k+1, mDimR);
}
```

where mDimR is the number of rows and mDimC of colums. L is my variable from my Matrix class that I want to factorize and getGpuDataPointer() returns the (float*) adress of the array in the device. permute is a mDimR*1 matrix which contains at the end the order of swapped rows, and which initially contains 1:mDimR integers.

Most of the time in this code is spent in cublasSger(). The result is fine.

My question is : executing this code for large matrices, it is really slow compared to matlab’s lu() function. Matlab is able to compute a 4096*4096 LU factorization in about 3 seconds (or 10 seconds on my other computer) whereas this function takes more than 15 seconds for a 3072*3072 matrix.

Is my initial algorithm so bad ? Is my GPGPU version not ok ? (I didn’t test this same code on cpu)

I might do a full parallel version of the LU factorization but I’m not familiar enough with parallel computing for the moment so it could take me some time. If a semi-parallel version (like this code) could work at least as fast as matlab’s code, it would be good…

Thank you very much in advance !

Nicolas