Hi,

I would like to implement a Qr factorization using GPU.

First, I have to take a reference to make benchmarks, so I’m trying to use the sgeqrf subroutine from Mkl.

The code works but it gives me wrong result, comparing with Matlab.

Second, I try to use the sgeqrf subroutine from CULA basic and I have the same problem.

If someone could take a look of my code using Mkl :

#include <mkl.h>

#include <stdio.h>

#include <stdlib.h>

int main()

{

int i,j;

```
int M=6; //number of rows
int N=6; //number of columns
//allocation memoire
float* A_h=(float*)malloc(M*N*sizeof(float));
float* tau=(float*)malloc(N*sizeof(float));
//Initialisation of random matrix A
printf("Initialisation of A\n");
for (i=0;i<M;i++)
{
for (j=0;j<N;j++)
{
A_h[i*N+j]=rand() % 10;
}
}
//Write the matrix in A.txt
FILE* A=fopen("A.txt","w");
if (A)
{
for (i=0;i<M;i++)
{
for (j=0;j<N;j++)
{
fprintf(A,"%f\t",A_h[i*N+j]);
}
fputc('\n',A);
}
}
fclose(A);
int lwork=N;
float* work=(float*)malloc(lwork*sizeof(float));
int info=0;
printf("execution of sgeqrf\n");
sgeqrf(&M, &N, A_h, &M, tau, work, &lwork, &info);
//Write the output in out_Mkl.txt
FILE* out=fopen("out_Mkl.txt","w");
if (out)
{
for (i=0;i<M;i++)
{
for (j=0;j<N;j++)
{
fprintf(out,"%f\t",A_h[i*N+j]);
}
fputc('\n',out);
}
}
fclose(out);
free(A_h);
free(tau);
return 0;
```

}

Makefile :

CC=icc

CFLAGS=-O3

LIBS=-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread

build64:

{CC} -o qrMkl qrMkl.c (CFLAGS) $(LIBS)

It’s a very simple case with M=N=6 …

Thank you for your answers.