In Eigen library, SparseLU class has a transpose()
method, which returns an expression of the transposed of the factored matrix. It is useful to solve for the transposed problem, and is very fast (probably because A = L U → AT = UT LT).
For example, the following code can solve both A x = b and AT y = c. (A certain algorithm does need to solve both of them).
Eigen::SparseLU<Eigen::SparseMatrix<double>,Eigen::COLAMDOrdering<double> > solver;
solver.compute(A);
x = solver.solve(b);
y = solver.transpose().solve(c);
But in cuDSS, I can’t find any equivalent method. So I can only treat them as 2 different problems.
cudssHandle_t handle;
// cudssCreate, cudssConfigCreate, cudssDataCreate, cudssMatrixCreateCsr, cudssMatrixCreateDn...
cudssExecute(dss_handle, CUDSS_PHASE_ANALYSIS, config, data, A, x, b)
cudssExecute(dss_handle, CUDSS_PHASE_FACTORIZATION, config, data, A, x, b)
cudssExecute(dss_handle, CUDSS_PHASE_SOLVE, config, data, A, x, b)
// Clean up...
// Transpose matrix A to A_T using cusparseCsr2cscEx2 or someting else...
cudssHandle_t dss_handle_T;
// cudssCreate, cudssDataCreate, cudssMatrixCreateCsr, cudssMatrixCreateDn...
cudssExecute(dss_handle_T, CUDSS_PHASE_ANALYSIS, config, data_T, A_T, y, c)
cudssExecute(dss_handle_T, CUDSS_PHASE_FACTORIZATION, config, data_T, A_T, y, c)
cudssExecute(dss_handle_T, CUDSS_PHASE_SOLVE, config, data_T, A_T, y, c)
// Clean up...
This is not efficient because 2 LU factorizations are performed, although only 1 is needed in Eigen.
Are there any better solutions?