Multi-frontal direct solver for general sparse matrix


New to the CUDA forums and unfortunately, never really had a chance to play with it. However, someone from our team in Shanghai has worked with CUDA for a while and come up with something interesting. I don’t want to spam the forums with a product announcement but I would like to do a quick survey of what is currently available to make sure we are doing something really new.

Our main application is semiconductor physics and we do a lot of FEM modeling of very non-linear equations. Because of that, iterative solvers have never been sufficiently powerful to converge reliably with our badly conditioned asymmetric matrices and we rely on direct solvers. In order to parallelize as much as possible, we had our own multi-frontal solver and recently implemented the well-known MUMPS ( solver.

When we started considering using a CUDA version of a multi-frontal solver a few months ago, it seemed like none was available. To the best of my knowledge, only direct solvers for full/banded matrices (CULA LAPACK) and iterative sparse solvers ( exist. I know that ANSYS has also ported its own mechanical FEM solver but I do not know if they are using a direct solver or an iterative one.

So what can the experienced CUDA developers out there tell me ? Has anyone else ported a multi-frontal direct solver to CUDA ? I’d like to believe our developer’s claims that he is the first but a little due diligence never hurt anyone.

If anyone is interested, I can release some benchmark comparisons to MUMPS.


Michel Lestrade
Crosslight Software

To board moderators; please delete my duplicate posts. Server was giving me HTTP 500 errors …

Michel Lestrade
Crosslight Software

There are a couple of direct sparse solvers that are CUDA accelerated:



If you can post a link to benchmark data, it will be very useful.

The following papers may be of interest:
Robert F. Lucas, Gene Wagenbreth, Dan M. Davis, and Roger Grimes
Multifrontal Computations on GPUs and Their Multi-core Hosts
Geraud P. Krawezik, Gene Poole
Accelerating the ANSYS Direct Sparse Solver with GPUs

Thanks for the feedback. Here is what we have so far:

Mesh size | 180K | 214K | 230K


MUMPS | 687 | 1443 | 2023


GPU-MF | 519 | 698 | 801


The time is in seconds and is the total solver time for our non-linear Newton solver. There are 3 variables per node point so the smallest matrix is n*n with n=0.54E6. I don’t have the number of non-zero elements on hand but the largest matrix maxed out the 16 GB of RAM on our test machine. However, our software does more than just the matrix calculations so that may not be a reliable benchmark.

The hardware used is a single C1060 Tesla card on a i7 chip. MUMPS is parallelized only on the i7 cores vs. the hundreds of cores of the Tesla.

The sparse matrix itself is asymmetric and very badly conditioned.

Thanks for the papers. I had some of our developers take a closer look and the key point seems to be GENERAL sparse matrix. From what they tell me, the ANSYS accelerated solver and other GPU solvers we’ve seen so far are all for symmetric sparse matrices.

Do you know of anyone besides the grusoft link your colleague put up who has worked on an accelerated matrix solver that can handle asymmetric matrices ?