Does CUDA 3.0 support any LAPACK functions?

StrongYao · April 21, 2010, 8:43am

Does CUDA 3.0 provide any library for LAPACK functions? Or just using CULA is enough?

avidday · April 21, 2010, 8:55am

The only linear algebra routines that come with CUDA are in CUBLAS and that does not include any LAPACK routines (although CUBLAS 3.0 is much, much more complete than earlier releases). I doubt that will change. Magma or CULA seem to be the best options at the moment if you want common LAPACK operations, although they too are far from complete.

Kyle_Spagnoli · April 21, 2010, 12:12pm

CULA is up to about 150 LAPACK functions and we feel we’ve hit the most popular routines so far. If there is something specific that you need, please let us know as we are always looking for suggestions to steer further development.

avidday · April 21, 2010, 12:29pm

Sorry Kyle, I didn’t mean to slight CULA in any way - the development pace is very impressive and I have never been able to find fault with the performance or accuracy of any of the routines I have tried.

Having said that, the four LAPACK functions I have traditionally used the most that you don’t currently support are the sgbtrf/sgbtrs and dgbtrf/dgbtrs pairs. They are very, very handy for factorizing block Jacobian matrices that arise in diagonally implicit Runge-Kutta schemes, which I use rather often for integrating stiff ODEs.

Kyle_Spagnoli · April 21, 2010, 1:31pm

No offense taken :)

It’s interesting that you pointed out your need for a banded matrix factorize and solve as this is next on our plate for new functionality. We feel it’s the last major section of LAPACK that we haven’t explored.

I’m curious though, what’s the approximate size of your matrices that you typically work with if you don’t mind sharing?

avidday · April 21, 2010, 2:20pm

Usually not huge - for big problems there really isn’t much choice but to use sparse tools, although we try to avoid it where possible. Each banded sub-block would typically be in the order 10^4 square, and the block matrix itself is usually 3x3, 4x4 or 5x5 lower triangular (that depends on the number of stages in the RK scheme). The usual strategy is factorize or invert each sub-block separately and then use pre-coded matrix manipulations to compute the entries of the equivalent block factorization which is then used as part of a Newton iteration.

jclee · June 29, 2010, 7:28pm

I am trying to introduce CUDA into CFD in my organization. The road block is indeed the btrs/btrf pairs for both double precision and double complex variables. These banded matrices come out from the most typical discretization schemes on PDEs. Either all out cuda based banded matrix solvers or GPU accelerated version of the scaLAPACK calls (pgbtrs/pgbtrf) would be very useful. Any chance these banded matrix solvers are included in the CULA 2.0?

jclee · June 29, 2010, 7:28pm

I am trying to introduce CUDA into CFD in my organization. The road block is indeed the btrs/btrf pairs for both double precision and double complex variables. These banded matrices come out from the most typical discretization schemes on PDEs. Either all out cuda based banded matrix solvers or GPU accelerated version of the scaLAPACK calls (pgbtrs/pgbtrf) would be very useful. Any chance these banded matrix solvers are included in the CULA 2.0?

wanderine · July 10, 2010, 11:46am

Have you made any work, or are you planning to, on operations such as matrix inverses and eigenvalue decompositions in each thread (for small matrices, 2 x 2 - 15 x 15) ? I have implemented a 4 x 4 matrix inverse in each thread by myself, now I need to invert a 7 x 7 matrix in each thread…

igirotto · July 14, 2010, 10:09am

Dear all,

do you know if it may be possible to get and test the function DSYEVX for a GPGPU’s version of a molecular dynamic, open source code?

To be hopefully integrated in the next CUDA version of it.

Cheers,

Ivan

Topic		Replies	Views
"Beta of LAPACK optimized for CUDA GPUs available" CUDA Programming and Performance	4	12513	October 6, 2009
CULA 1.3 Released CUDA Programming and Performance	1	1540	April 21, 2010
dpotri (LAPACK) with CUDA CUDA Programming and Performance	2	4807	February 12, 2010
CULAPACK CUDA Programming and Performance	9	18123	April 30, 2009
CULA vs CUSOLVER GPU-Accelerated Libraries	1	1961	February 25, 2015
Complete cuBLAS anytime soon? CUDA Programming and Performance	9	12186	November 18, 2009
LAPACK with cublas - how and is it worth it? CUDA Programming and Performance	14	26309	April 7, 2010
CUDA Lapack Lapack CUDA Programming and Performance	2	6965	November 3, 2009
Eigen Decomposition in future release? And other various lapack related routine CUDA Programming and Performance	7	14040	September 2, 2008
LAPACK need matrix inversion/factorization CUDA Programming and Performance	0	1542	February 22, 2008

Does CUDA 3.0 support any LAPACK functions?

Related topics