Just to reply with what I know - but I don’t know much…
The cublas are a set of subroutines specialized to CUDA - I don’t believe they can be used in place of BLAS (and the set of BLAS is not yet complete anyways). So you could not use LAPACK, and have the LAPACK routines call the cublas, I don’t believe.
There are also no LAPACK routines, (i.e., no SVD routines) available that use CUDA; I suspect that that is in the works, however. LAPACK is such an obvious thing, I can’t imagine that Nvidia would ignore the matter.
Also, your small SVD’s may not even be suitable for CUDA - generally you need to have a somewhat large-sized problem for CUDA to start to be worthwhile. The main hang up is host<->device communications. If your small SVD’s were somehow operating on the same data, that might be a different matter. Then your calculations would be mostly on the GPU, with minimal data transfer overhead.
I’ve bugged these lists a few times about matrix inversion, or the solution to A*X=B, and have been met with just silence…so even that is not yet implemented. But I suspect (hope?) it is in the works; I know people are working on it, if not at Nvidia.