I would like to implement a Qr factorization using GPU.
First, I have to take a reference to make benchmarks, so I’m trying to use the sgeqrf subroutine from Mkl.
The code works but it gives me wrong result, comparing with Matlab.
Second, I try to use the sgeqrf subroutine from CULA basic and I have the same problem.
If someone could take a look of my code using Mkl :
int M=6; //number of rows
int N=6; //number of columns
//allocation memoire
float* A_h=(float*)malloc(M*N*sizeof(float));
float* tau=(float*)malloc(N*sizeof(float));
//Initialisation of random matrix A
printf("Initialisation of A\n");
for (i=0;i<M;i++)
{
for (j=0;j<N;j++)
{
A_h[i*N+j]=rand() % 10;
}
}
//Write the matrix in A.txt
FILE* A=fopen("A.txt","w");
if (A)
{
for (i=0;i<M;i++)
{
for (j=0;j<N;j++)
{
fprintf(A,"%f\t",A_h[i*N+j]);
}
fputc('\n',A);
}
}
fclose(A);
int lwork=N;
float* work=(float*)malloc(lwork*sizeof(float));
int info=0;
printf("execution of sgeqrf\n");
sgeqrf(&M, &N, A_h, &M, tau, work, &lwork, &info);
//Write the output in out_Mkl.txt
FILE* out=fopen("out_Mkl.txt","w");
if (out)
{
for (i=0;i<M;i++)
{
for (j=0;j<N;j++)
{
fprintf(out,"%f\t",A_h[i*N+j]);
}
fputc('\n',out);
}
}
fclose(out);
free(A_h);
free(tau);
return 0;
Probably just column major versus row major ordering between what you are passing to sgeqrf and the way you are writing the data to file to use in Matlab. But I am a little curious as to what this has to do with CUDA?
The concept of storage ordering is well explained here. BLAS and LAPACK use column-major ordered storage conventions (inherited from FORTRAN). When you are writing out arrays to file, you are using row-major ordering. I am guessing that if you change the code which writes the arrays to file to this:
for (i=0;i<M;i++)
{
for (j=0;j<N;j++)
{
fprintf(out,"%f\t",A_h[i+j*N]);
}
fputc('\n',out);
}