How to efficiently solve transposed problem using cuDSS?

In Eigen library, SparseLU class has a transpose() method, which returns an expression of the transposed of the factored matrix. It is useful to solve for the transposed problem, and is very fast (probably because A = L U → AT = UT LT).
For example, the following code can solve both A x = b and AT y = c. (A certain algorithm does need to solve both of them).

Eigen::SparseLU<Eigen::SparseMatrix<double>,Eigen::COLAMDOrdering<double> > solver;
solver.compute(A);
x = solver.solve(b);
y = solver.transpose().solve(c);

But in cuDSS, I can’t find any equivalent method. So I can only treat them as 2 different problems.

cudssHandle_t handle;
// cudssCreate, cudssConfigCreate, cudssDataCreate, cudssMatrixCreateCsr, cudssMatrixCreateDn...
cudssExecute(dss_handle, CUDSS_PHASE_ANALYSIS, config, data, A, x, b)
cudssExecute(dss_handle, CUDSS_PHASE_FACTORIZATION, config, data, A, x, b)
cudssExecute(dss_handle, CUDSS_PHASE_SOLVE, config, data, A, x, b)
// Clean up...
// Transpose matrix A to A_T using cusparseCsr2cscEx2 or someting else...
cudssHandle_t dss_handle_T;
// cudssCreate, cudssDataCreate, cudssMatrixCreateCsr, cudssMatrixCreateDn...
cudssExecute(dss_handle_T, CUDSS_PHASE_ANALYSIS, config, data_T, A_T, y, c)
cudssExecute(dss_handle_T, CUDSS_PHASE_FACTORIZATION, config, data_T, A_T, y, c)
cudssExecute(dss_handle_T, CUDSS_PHASE_SOLVE, config, data_T, A_T, y, c)
// Clean up...

This is not efficient because 2 LU factorizations are performed, although only 1 is needed in Eigen.
Are there any better solutions?

Hello!

You’ve correctly noticed that transpose solve is not currently supported in cudss. We are fully aware that this is
more or less a standard feature (and it is relatively straightforward to add to cudss) but since we did not receive any requests to add it, we have focused on other things.

There is no better solution right now than actually transposing the matrix and having two solves with cudss.

We would be very much interested to know more about your use case and application area, it might help promote this feature request.

Note, that you can also contact us via cuDSS-EXTERNAL-Group@nvidia.com if you prefer email over this forum.

Thanks,
Kirill

1 Like

Thank you for your reply!
Hope transpose solve feature be added in a future release.
As a example of application, in some variants of the simplex method, B xb = b, w B = cb, and B yk = pk should be solved in a single iteration. The second one can be converted to BT wT = cbT. The first one and the third one can be solved using one cudssHandle_t, but the second one cannot (unless transpose solve feature exists).
However, solving linear programming using GPU may be not a good idea, as there are already many good-enough CPU solvers.

Thanks for sharing more details!

However, solving linear programming using GPU may be not a good idea, as there are already many good-enough CPU solvers.

I’d suggest you check and see if cuDSS shows performance benefits over the CPU solvers. It certainly depends on the application ~ matrix properties, but keep in mind that GPUs often have larger compute throughput and memory bandwidth compared to CPUs and if the sparse linear solver is implemented efficiently it should be able to take benefit from either one of these two major characteristics and be faster on a GPU.

If you don’t see better performance with cuDSS, you can share your use case and we could check why this is so.

Thanks,
Kirill

1 Like

I will test them. But it takes time…
Thank you again.