Hello,
I am currently implementing an algorithm that solves a linear system.
I decided to use the cublas library for it has most of the functions i need. To get used to the library I decided to do a spike solution that would do the factorization of a single static matrix.
At first I did some samples with 3x3 matrices and the results were as expected, then i tried to expand the sample to 10x10 matrices, in my understanding the process should be the same with the exception of a bigger matrice and bigger copies to and from device.
To my surprise the sample with 10x10 values does not return the correct values, with most of them being nan.
The code is as follows:
int main(int argc, char** argv){
int i, j;
double arrA[10][10] = {
{21.0,21.0,21.0, 21.0, 21.0, 21.0, 21.0,21.0, 21.0, 21.0},
{21.0,21.0,21.0, 21.0, 21.0, 21.0, 21.0,21.0, 21.0, 21.0},
{21.0,21.0,21.0, 21.0, 21.0, 21.0, 21.0,21.0, 21.0, 21.0},
{21.0,21.0,21.0, 21.0, 21.0, 21.0, 21.0,21.0, 21.0, 21.0},
{21.0,21.0,21.0, 21.0, 21.0, 21.0, 21.0,21.0, 21.0, 21.0},
{21.0,21.0,21.0, 21.0, 21.0, 5.0, 21.0,21.0, 21.0, 21.0},
{21.0,21.0,21.0, 21.0, 21.0, 21.0, 21.0,21.0, 21.0, 21.0},
{21.0,21.0,21.0, 21.0, 21.0, 21.0, 21.0,21.0, 21.0, 21.0},
{21.0,21.0,21.0, 21.0, 21.0, 21.0, 21.0,21.0, 21.0, 21.0},
{21.0,21.0,21.0, 21.0, 21.0, 21.0, 21.0,21.0, 21.0, 21.0}
};
double *arrADev, *arrBDev, *resultsVec;
double **matrixArray;
int *pivotArray;
int *infoArray;
double flat[100] = {0};
int info[10] = {-1};
int pivot[10] = {-1};
cublasHandle_t cublasHandle;
double *matrices[2];
HANDLE_ERROR(cudaMalloc(&arrADev, sizeof(double) * 100));
HANDLE_ERROR(cudaMalloc(&arrBDev, sizeof(double) * 100));
HANDLE_ERROR(cudaMalloc(&resultsVec, sizeof(double) * 10));
HANDLE_ERROR(cudaMalloc(&matrixArray, sizeof(double*) * 2));
HANDLE_ERROR(cudaMalloc(&pivotArray, sizeof(int) * 100));
HANDLE_ERROR(cudaMalloc(&infoArray, sizeof(int) * 100));
cublasCreate(&cublasHandle);
int matrixSize = 10;
//maps matrix to flat vector
for(i=0; i<matrixSize; i++){
for(j=0; j<matrixSize; j++){
flat[i+j*matrixSize] = arrA[i][j];
}
}
//copy matrix A to device
HANDLE_CUBLAS(cublasSetMatrix(matrixSize, matrixSize, sizeof(double), flat, matrixSize, arrADev, matrixSize));
//save matrix address
matrices[0] = arrADev;
//copy matrices references to device
HANDLE_ERROR(cudaMemcpy(matrixArray,matrices, sizeof(double*)*1, cudaMemcpyHostToDevice));
//LU factorization
HANDLE_CUBLAS(cublasDgetrfBatched(cublasHandle, matrixSize, matrixArray, matrixSize, pivotArray, infoArray, 1));
//get info array
HANDLE_CUBLAS(cublasGetVector(1, sizeof(int), infoArray, 1, info, 1));
//get pivot array
HANDLE_CUBLAS(cublasGetVector(matrixSize, sizeof(int), pivotArray, 1, pivot, 1));
//print info array
printf("Info Array:\n{");
for(i=0; i<1; i++){
printf(" %d", info[i]);
if(i < 0)
printf(",");
else
printf("}\n");
}
//print pivot array
printf("Pivot Array:\n{");
for(i=0; i<matrixSize; i++){
printf(" %d", pivot[i]);
if(i < matrixSize-1)
printf(",");
else
printf("}\n");
}
//get LU matrix
HANDLE_CUBLAS(cublasGetMatrix(matrixSize, matrixSize, sizeof(double), arrADev, matrixSize, flat, matrixSize));
//print LU matrix
printf("Matrix A\n");
for(i=0; i<matrixSize; i++){
for(j=0; j<matrixSize; j++){
printf(" %12.1f", flat[i+j*matrixSize]);
}
printf("\n");
}
return 0;
}
All the values from the matrix are the same with the exception of a single value.
To my knowledge the result should be:
21.0 21.0 21.0 21.0 21.0 21.0 21.0 21.0 21.0 21.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 -16.0 0.0 0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
the results from the cublas function are:
21.0 21.0 21.0 21.0 21.0 21.0 21.0 21.0 21.0 21.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.0 -nan -nan -nan -nan -nan -nan -nan -nan -nan
1.0 -nan -nan -nan -nan -nan -nan -nan -nan -nan
1.0 -nan -nan -nan -nan -nan -nan -nan -nan -nan
1.0 -nan -nan -nan -nan -nan -nan -nan -nan -nan
1.0 -nan -nan -nan -nan -nan -nan -nan -nan -nan
1.0 -nan -nan -nan -nan -nan -nan -nan -nan -nan
1.0 -nan -nan -nan -nan -nan -nan -nan -nan -nan
1.0 -nan -nan -nan -nan -nan -nan -nan -nan -nan
I do see if the function have been successfull either with the status code and with the array info.
All the copies are being mapped from row-major to column-major and vice-versa.
Any help would be appreciated.