I was just wondering if there are any plans for an eigen decomposition routine in upcoming CUDA releases. It seems like there are tons of people who could use this, and many times programming one themselves is not feasible. Also, other various Lapack routines seem to be very much needed.
If there are plans, any hint as to how much of a speed increase it would give over an optimized Single-threaded BLAS library (such as MKL) ? And, any hint as to when it would come out?