I have a giant matrix (A) on a GPU that I need orthogonally decomposed (A=U*V^T). Using an SVD came to mind, but perhaps you guys would steer me a different direction.

A is 131262 by 1380 in memory so that means if I’m using a full SVD decomposition the matrix U will not fit on my GPU. U alone would take up 131262^2*4 = 68918850576 bytes.

I was hoping that cuSOLVER’s cusolverDnSgesvd could produce a truncated SVD given only enough memory to fit a truncated SVD, but it doesn’t seem to. The documentation for cusolverDnSgesvd suggests that for an mxn mtx A, U needs to be be mxm in memory even if you use the jobu = ‘S’ option.

Can anyone suggest an alternative stable and hopefully fast approach given the size of my mtx A? I really appreciate all the help!

I was hoping that cuSOLVER’s cusolverDnSgesvd could produce a truncated SVD given only enough memory to fit a truncated SVD, but it doesn’t seem to. The documentation for cusolverDnSgesvd suggests that for an mxn mtx A, U needs to be be mxm in memory even if you use the jobu = ‘S’ option.