Does CUDA 3.0 support any LAPACK functions?

Does CUDA 3.0 provide any library for LAPACK functions? Or just using CULA is enough?

The only linear algebra routines that come with CUDA are in CUBLAS and that does not include any LAPACK routines (although CUBLAS 3.0 is much, much more complete than earlier releases). I doubt that will change. Magma or CULA seem to be the best options at the moment if you want common LAPACK operations, although they too are far from complete.

CULA is up to about 150 LAPACK functions and we feel we’ve hit the most popular routines so far. If there is something specific that you need, please let us know as we are always looking for suggestions to steer further development.

Sorry Kyle, I didn’t mean to slight CULA in any way - the development pace is very impressive and I have never been able to find fault with the performance or accuracy of any of the routines I have tried.

Having said that, the four LAPACK functions I have traditionally used the most that you don’t currently support are the sgbtrf/sgbtrs and dgbtrf/dgbtrs pairs. They are very, very handy for factorizing block Jacobian matrices that arise in diagonally implicit Runge-Kutta schemes, which I use rather often for integrating stiff ODEs.

No offense taken :)

It’s interesting that you pointed out your need for a banded matrix factorize and solve as this is next on our plate for new functionality. We feel it’s the last major section of LAPACK that we haven’t explored.

I’m curious though, what’s the approximate size of your matrices that you typically work with if you don’t mind sharing?

Usually not huge - for big problems there really isn’t much choice but to use sparse tools, although we try to avoid it where possible. Each banded sub-block would typically be in the order 10^4 square, and the block matrix itself is usually 3x3, 4x4 or 5x5 lower triangular (that depends on the number of stages in the RK scheme). The usual strategy is factorize or invert each sub-block separately and then use pre-coded matrix manipulations to compute the entries of the equivalent block factorization which is then used as part of a Newton iteration.

I am trying to introduce CUDA into CFD in my organization. The road block is indeed the btrs/btrf pairs for both double precision and double complex variables. These banded matrices come out from the most typical discretization schemes on PDEs. Either all out cuda based banded matrix solvers or GPU accelerated version of the scaLAPACK calls (pgbtrs/pgbtrf) would be very useful. Any chance these banded matrix solvers are included in the CULA 2.0?

I am trying to introduce CUDA into CFD in my organization. The road block is indeed the btrs/btrf pairs for both double precision and double complex variables. These banded matrices come out from the most typical discretization schemes on PDEs. Either all out cuda based banded matrix solvers or GPU accelerated version of the scaLAPACK calls (pgbtrs/pgbtrf) would be very useful. Any chance these banded matrix solvers are included in the CULA 2.0?

Have you made any work, or are you planning to, on operations such as matrix inverses and eigenvalue decompositions in each thread (for small matrices, 2 x 2 - 15 x 15) ? I have implemented a 4 x 4 matrix inverse in each thread by myself, now I need to invert a 7 x 7 matrix in each thread…

Dear all,

do you know if it may be possible to get and test the function DSYEVX for a GPGPU’s version of a molecular dynamic, open source code?

To be hopefully integrated in the next CUDA version of it.

Cheers,

Ivan