How to efficiently solve transposed problem using cuDSS?

fluxlinkage · October 10, 2024, 3:35am

In Eigen library, SparseLU class has a transpose() method, which returns an expression of the transposed of the factored matrix. It is useful to solve for the transposed problem, and is very fast (probably because A = L U → A^T = U^T L^T).
For example, the following code can solve both A x = b and A^T y = c. (A certain algorithm does need to solve both of them).

Eigen::SparseLU<Eigen::SparseMatrix<double>,Eigen::COLAMDOrdering<double> > solver;
solver.compute(A);
x = solver.solve(b);
y = solver.transpose().solve(c);

But in cuDSS, I can’t find any equivalent method. So I can only treat them as 2 different problems.

cudssHandle_t handle;
// cudssCreate, cudssConfigCreate, cudssDataCreate, cudssMatrixCreateCsr, cudssMatrixCreateDn...
cudssExecute(dss_handle, CUDSS_PHASE_ANALYSIS, config, data, A, x, b)
cudssExecute(dss_handle, CUDSS_PHASE_FACTORIZATION, config, data, A, x, b)
cudssExecute(dss_handle, CUDSS_PHASE_SOLVE, config, data, A, x, b)
// Clean up...
// Transpose matrix A to A_T using cusparseCsr2cscEx2 or someting else...
cudssHandle_t dss_handle_T;
// cudssCreate, cudssDataCreate, cudssMatrixCreateCsr, cudssMatrixCreateDn...
cudssExecute(dss_handle_T, CUDSS_PHASE_ANALYSIS, config, data_T, A_T, y, c)
cudssExecute(dss_handle_T, CUDSS_PHASE_FACTORIZATION, config, data_T, A_T, y, c)
cudssExecute(dss_handle_T, CUDSS_PHASE_SOLVE, config, data_T, A_T, y, c)
// Clean up...

This is not efficient because 2 LU factorizations are performed, although only 1 is needed in Eigen.
Are there any better solutions?

kvoronin · October 10, 2024, 5:36am

Hello!

You’ve correctly noticed that transpose solve is not currently supported in cudss. We are fully aware that this is
more or less a standard feature (and it is relatively straightforward to add to cudss) but since we did not receive any requests to add it, we have focused on other things.

There is no better solution right now than actually transposing the matrix and having two solves with cudss.

We would be very much interested to know more about your use case and application area, it might help promote this feature request.

Note, that you can also contact us via cuDSS-EXTERNAL-Group@nvidia.com if you prefer email over this forum.

Thanks,
Kirill

fluxlinkage · October 10, 2024, 6:55am

Thank you for your reply!
Hope transpose solve feature be added in a future release.
As a example of application, in some variants of the simplex method, B x_b = b, w B = c_b, and B y_k = p_k should be solved in a single iteration. The second one can be converted to B^T w^T = c_b^T. The first one and the third one can be solved using one cudssHandle_t, but the second one cannot (unless transpose solve feature exists).
However, solving linear programming using GPU may be not a good idea, as there are already many good-enough CPU solvers.

kvoronin · October 10, 2024, 7:28am

Thanks for sharing more details!

However, solving linear programming using GPU may be not a good idea, as there are already many good-enough CPU solvers.

I’d suggest you check and see if cuDSS shows performance benefits over the CPU solvers. It certainly depends on the application ~ matrix properties, but keep in mind that GPUs often have larger compute throughput and memory bandwidth compared to CPUs and if the sparse linear solver is implemented efficiently it should be able to take benefit from either one of these two major characteristics and be faster on a GPU.

If you don’t see better performance with cuDSS, you can share your use case and we could check why this is so.

Thanks,
Kirill

fluxlinkage · October 10, 2024, 8:48am

I will test them. But it takes time…
Thank you again.

Topic		Replies	Views
Example using cusparse and cusolverSpDcsrlsvchol GPU-Accelerated Libraries cusolver , cusparse	11	89	May 7, 2025
Cusparse for solving the sparse linear equation Ax=b Legacy PGI Compilers	8	2009	August 30, 2019
cuSPARSE BSR Matrix Solver GPU-Accelerated Libraries cuda , cusolver , cusparse	2	56	October 23, 2024
Sparse cusolver inside loop .................. factorization at every call? GPU-Accelerated Libraries cusparse	8	1412	March 9, 2024
Bugs when trying to perform tranpose of a matrix using cuSPARSE GPU-Accelerated Libraries	2	725	October 12, 2021
cusparseSpSV_solve function extremely slow GPU-Accelerated Libraries	4	44	November 19, 2024
example of sparse LL^t or QR functionality GPU-Accelerated Libraries	2	1264	April 3, 2015
Can you "hide" the cost of kernels through kernel fusion? E.g the cost of matrix transpose CUDA Programming and Performance	5	529	September 21, 2022
How to solve a tridiagonal matrix using the cusparse<t>gtsv2_nopivot() functions in the cusparse library GPU-Accelerated Libraries cusparse	7	765	November 5, 2023
What libraries for linear solvers on CUDA can you recommend? GPU-Accelerated Libraries cuda , cusparse	6	1760	November 16, 2022

How to efficiently solve transposed problem using cuDSS?

Related topics