Any Least Squares implementations?

Hey. Does anyone know if there are any good, fast, and stable least squares implementations available for CUDA? I’ve looked around, and I can’t find anything.

Least squares of what? Because imho regular least squares function regression where a linear compination of some functions is calculated is essentially a linear algebra problem and thus you can use CUBLAS…

I need the least squares of a cubic function, and higher order functions. I’ve taken a brief look at cublas, but I didn’t see anything I could readily use. I didn’t even see any QR, Cholesky or SVD factorizations…maybe I’m missing something? Anyway, I’m almost done implementing my Cholesky factorization, and that’s most of the work already.