Very slow performance of cusparseDcsrsv_analysis Iterative methods

Can anybody help me around this weird phenomena ?

I wrote a Conjugate-gradient library (iterative method) for solving linear algebraic systems of equations, I use LU factorization for preconditioning, so in the residuals updating step, I need to perform a triangular matrix solve twice, however, the analysis step (cusparseDcsrsv_analysis, which is performed only once) of the triangular solver takes alot of time ! for instance, if the whole solver is to need 360 ms (including all iterations) to converge, these two lines of analysis would need 300 ms of them (around 85% of the solver time !)

According to the papers published by Dr. Maxim Naumov, I know that the analysis step is to take significantly more time than the solve phase, but not that much according to his results too.

cusparseMatDescr_t descrL = 0 ;

cusparseMatDescr_t descrU = 0 ;

cusparseStatus = cusparseCreateMatDescr(&descrL) ;

cusparseStatus = cusparseCreateMatDescr(&descrU) ;


cusparseSetMatIndexBase(descrL,CUSPARSE_INDEX_BASE_ONE) ;

cusparseSetMatDiagType(descrL,CUSPARSE_DIAG_TYPE_UNIT) ;

cusparseSetMatFillMode(descrL,CUSPARSE_FILL_MODE_LOWER) ;


cusparseSetMatIndexBase(descrU,CUSPARSE_INDEX_BASE_ONE) ;

cusparseSetMatDiagType(descrU,CUSPARSE_DIAG_TYPE_NON_UNIT) ;

cusparseSetMatFillMode(descrU,CUSPARSE_FILL_MODE_UPPER) ;

cusparseSolveAnalysisInfo_t inforL = 0 ;

cusparseSolveAnalysisInfo_t inforU = 0 ;

cusparseStatus = cusparseCreateSolveAnalysisInfo(&inforL) ;

cusparseStatus = cusparseCreateSolveAnalysisInfo(&inforU) ;

startSP = omp_get_wtime() ;

cusparseStatus = cusparseDcsrsv_analysis(cusparseHandle, CUSPARSE_OPERATION_NON_TRANSPOSE, N, NZ, descrL, matrixLU, iRow, jCol, inforL) ;

if(cusparseStatus != CUSPARSE_STATUS_SUCCESS) printf("%s \n\n","cusparseDcsrsv_analysis1 Error !") ;

cusparseStatus = cusparseDcsrsv_analysis(cusparseHandle, CUSPARSE_OPERATION_NON_TRANSPOSE, N, NZ, descrU, matrixLU, iRow, jCol, inforU) ;

if(cusparseStatus != CUSPARSE_STATUS_SUCCESS) printf("%s \n\n","cusparseDcsrsv_analysis2 Error !") ;

finishSP = omp_get_wtime() ;

cusparseStatus = cusparseDcsrsv_solve(cusparseHandle, CUSPARSE_OPERATION_NON_TRANSPOSE, N, &c2, descrL, matrixLU, iRow, jCol, inforL, r, t) ;

if(cusparseStatus != CUSPARSE_STATUS_SUCCESS) printf("%s \n\n","cusparseDcsrsv_solve1 Error !") ;

cusparseStatus = cusparseDcsrsv_solve(cusparseHandle, CUSPARSE_OPERATION_NON_TRANSPOSE, N, &c2, descrU, matrixLU, iRow, jCol, inforU, t, z) ;

if(cusparseStatus != CUSPARSE_STATUS_SUCCESS) printf("%s \n\n","cusparseDcsrsv_solve2 Error !") ;

PS: I already posted this on the “General CUDA GPU Computing Discussion” forum, but It’s not quite active (except for njuffa’s reply, I sincerely thank him for his prompt reply) in there, plus now it seems to me that this section is more suitable for it, so I apologize if the duplication is in violation of the forum’s rule, I’m really unsure if it does.

Thanks in advance