I was just wondering if there are any plans for an eigen decomposition routine in upcoming CUDA releases. It seems like there are tons of people who could use this, and many times programming one themselves is not feasible. Also, other various Lapack routines seem to be very much needed.
If there are plans, any hint as to how much of a speed increase it would give over an optimized Single-threaded BLAS library (such as MKL) ? And, any hint as to when it would come out?
As I understand it, LAPACK is built on top of BLAS. So you could take LAPACK sources and convert them to use CUBLAS not? I am looking at solving systems of equations at this time and am looking into SCALAPACK, since that is a parallel version of LAPACK. I just don’t understand how they distribute the calculations over the nodes yet.
Yeah, I haven’t even ventured into SCALAPACK, as the project I am working on is done in 2 weeks and I wouldn’t have time to implement that. I actually wouldn’t have time to implement anything released by NVIDIA, but it would be nice to give the person I am working with an idea of if/what they will have to do when I leave (I’m currently an intern at Oak Ridge National Lab, hence my project having such a set end date). You conceivably could develop your own SVD/Eig routines and the like, but it seems like there is enough of a demand for these routines that from a business standpoint it would make no sense not to provide them. Especially if NVIDIA really wants to continue pushing CUDA.
I kind of assumed I wouldn’t get an NVIDIA answer here, but thought I’d give it a try :-P
I suggested this for the next CUDA contest…if it doesn’t get implemented soon, I’ll probably bite the bullet and give it a go myself, since there are some LAPACK routines that I use frequently…
I think this is something nVidia should really implement sometime soon…CUBLAS is great, but many “important” computations (signal processing, graph theory, solving systems of equations, etc.) need “higher level” routines.
EDIT: I also found this project, and email the researchers to see if they would share their port with us…
Why would you need to distribute calculations over the nodes? If you build “CULAPACK” on top of CUBLAS, then the system will already take care of the parallelization (as far as matrix/vector multiplications are concerned).
I’m thinking of trying a port myself, so please let me know if I’m missing something here…