I want to use Python CUDA (e.g. CuPy) to perform linear algebra operations on large matrices for some physics simulations.
The problem is that these matrices get very large, so I need to distribute them over multiple GPU nodes to make them fit.
Is there a way I can save a single matrix over multiple GPU nodes and perform a singular value decomposition on that matrix?
The matrix is NOT sparse.
this may be of interest.
Thanks for this article.
So my matrices are square, so only the ‘approximate SVD’ that is proposed in this article would work for me. An exact SVD would be preferable in my case.