What's the best way to solve a sparse linear system Ax = B where A is a non-square matrix?

I’m curious as written in the title.
Many methods I googled online focus on B being a vector, or A being a dense matrix. And the functions that sound like a good candidate in the API documents require A to be a square matrix… I want to solve this linear system when A is a huge rectangular matrix and B is a huge square matrix (like each dimension being at least bigger than 7000). Is it possible with the current CUDA support?

What about cusolverSpcsrlsqvqr() , or rather the batched variant cusolverSpScsrqrsvBatched ?

But isn’t that function for cases where B is a vector instead of a matrix?

You can compute the column vectors independently, can you not?

If A * x1 = b1 and A * x2 = b2 for vectors x1,x2,b1,b2 , then A * X = B for matrices with two columns X=(x1,x2), B=(b1,b2)

I mean I could… but the problem is that there would be 7000 operations I would have to fire at once… I was wondering if that could cause significant overhead. Would it not? I’m sorry if these questions are really beginner level I am new to CUDA

You could just start implementing it and see if the performance is good enough for you. I have already suggested a batched variant above.