cuDSS

Hi,
I am trying to benchmark cuDSS and cuSPARSE for direct sparse solvers using CUDA C. I am using the circuit simulation matrices from SuiteSparse. I am using nvml to measure power consumption during solving. I am also trying to benchmark these 2 frameworks for their runtime of the matrix solving.

cuDSS: CUDSS_CONFIG_HYBRID_MODE is enabled, CUDSS_CONFIG_REORDERING_ALG = set to CUDSS_ALG_3 in cudssConfigSet, CUDSS_CONFIG_FACTORIZATION_ALG = set to CUDSS_ALG_, CUDSS_CONFIG_SOLVE_ALG is set to default. CUDSS_CONFIG_HYBRID_EXECUTE_MODE this is also enabled using cudssConfigSet.

cuSPARSE:
// Solver configuration
cusparseOperation_t opA = CUSPARSE_OPERATION_NON_TRANSPOSE;
cusparseSpSVAlg_t alg = CUSPARSE_SPSV_ALG_DEFAULT;

Using cusparseSpSV_analysis and cusparseSpSV_solve to solve the matrices.

My assumption was that cuDSS is a direct sparse solver and expected it to run faster, in terms of solve time. But upon testing my codes, for matrices with 1000s to 1,000,000s of rows x cols, cuSPARSE mostly seems to solve quicker.

Is there something I am missing or is this the expected behavior?

Driver version: Cuda compilation tools, release 12.9, V12.9.86
GPU: RTX 3070Ti
cuDSS v 0.6.0

Hi @sr7ramcanbe.reached

To compare performance first you need to make sure you’re comparing the same operation.

cuDSS computes a factorization of a sparse matrix A (say, as P*A = L * U) and solves the system Ax = b. Its solve step is roughly a pair of sparse triangular solves (forward and backward) for the factors (say, L and U).

When you call SpSV from cusparse, it operates upon a triangular matrix.

How do you ensure that input to SpSV is the same triangular matrix as the one cuDSS uses internally for its solve phase? Since we currently (as of cudss 0.6.0) don’t have a functionality for users to extract factors, I doubt that you use the same data for cusparse and cudss.

If I missed something, please correct me.

Note: cusparse spsv can be viewed as a direct sparse triangular solver and thus can be faster than cuDSS when a triangular system is solved. The differentiating feature of cuDSS is that it can solve sparse system with general matrices.

Thanks,
Kirill

Hi @kvoronin
I am feeding the same sparse matrix to both the solvers, cudss and cusparse. There is no way to ensure that cusparse is fed the same triangular matrix as the one cudss is working on. So can i assume that because cudss has to do an LU factorization to arrive at the triangular matrix (which is what the cusparse solver starts with) there is some overhead in runtime for cudss which is unavoidable?

Based on your note, cusparse is a direct sparse solver for triangular matrices whereas cudss is a direct sparse solver for general matrices; so it can be concluded that the overhead seen in runtime for cudss is valid and expected.

Correct me if I am wrong here, please.

Thanks,

sr7

Thanks for the clarification!

Ok, so you’re comparing how cuDSS and cuSPARSE solve a sparse triangular system. In that case, cuDSS can indeed have extra overhead compared to cuSPARSE, especially for smaller systems.

If it is critical for your application, you can make a request for cuDSS to match performance of cuSPARSE (because theoretically there can be no difference between the two), but right now we’re more focused on cuDSS as a solver for general sparse systems.

Triangular sparse systems are considered an edge case and since there is already functionality in cuSPARSE for it, we haven’t prioritized this case in cuDSS at all.

Thanks!

1 Like

@kvoronin sorry for the delayed response

I am comparing cuDSS and cuSPARSE on how they solve a sparse square matrix, not a triangular system. I observe cuSPARSE has a better runtime over cuDSS

@sr7ramcanbe.reached,

I am afraid I don’t understand what you’re comparing with what then. A sparse square matrix can be symmetric or general (non-symmetric).

If you have a linear system Ax=b, then cuSPARSE has triangular solve APIs which can solve the systems where the sparse (square) matrix A is triangular, while cuDSS can solve systems where the matrix A is general.

In your last comment you said that you’re not solving not a triangular system. How are you solving such a system with cuSPARSE APIs?

Thanks,
Kirill