Parallel large SVD

I have 10s of matrices of size of order 1000 x 1000 to decompose. cuSOLVER batched svd is for 32 x 32 only. I am thinking about launching a separate svd routines on different streams. Since I am using A100, I think it should be possible to let each SM do one matrix. Is it possible? What do I need to do for that to happen (perhaps setting launch bounds)?

There isn’t anything you can do with cusolver to limit a particular op to 1 SM or in any way directly restrict its footprint of GPU resource utilization.

If your question is about cusolver, please ask it on the libraries forum.

Dense matrices of 1000x1000 may give good utilization of currently available GPUs without the need for further parallelization.

1 Like