cusolverDnCgesvd performance vs MKL

Having spent some time recently writing sparse linear algebra sub-routines, it is important to note that some of the algorithms have quite a bit of inherent serial dependencies. The trick is to use pre-processing routines (combinatorial) to reorganize the input matrix into a form (and a series of ordered processing levels) which can be computed in parallel in a larger outer CPU loop.

It is possible to beat the CPU multi-thread implementations by as much as 2-3 times for applications such as sparse LU or sparse Cholesky factorization, but you may have to ‘roll your own’ implementation. NVIDIA does a good job of providing some free functionality in their SDK, but they too have limited resources. I expect as time goes on there will be incremental improvements to cuSparse, cuBlas, cuSolver and MAGMA, all of which are free (unlike MKL to my understanding, correct me if I am wrong).

There is some discussion about the (sub-optimal, when compared with MKL) performance of SVD on the GPU at matlab - SVD speed in CPU and GPU - Stack Overflow

As indicated in the referenced paper in that discussion, the classical Golub-Reinsch algorithm seems to be tough to port efficently to the GPU, due to serial dependencies.

There is some research on parallel methods for SVD computation (just google ‘parallel SVD’) - see http://www.irisa.fr/sage/bernard/publis/SVD-Chapter06.pdf , https://hal.inria.fr/inria-00071892/document , http://slepc.upv.es/material/slides/harrachov.pdf , http://www.maths.manchester.ac.uk/~higham/papers/hipa94a.pdf , An overview of parallel algorithms for the singular value and symmetric eigenvalue problems - ScienceDirect and (quite recent and looks interesting) http://www.netlib.org/lapack/lawnspdf/lawn283.pdf

Dear,
Sorry intrude, I’m not so good with algebra linear, how can model a linear system that solves coefficients of matrix limiting the values of coefficients (i.e., -1 <= a <= 1, b == 1, c >= 0), it’s possible?