I’m curious as written in the title.

Many methods I googled online focus on B being a vector, or A being a dense matrix. And the functions that sound like a good candidate in the API documents require A to be a square matrix… I want to solve this linear system when A is a huge rectangular matrix and B is a huge square matrix (like each dimension being at least bigger than 7000). Is it possible with the current CUDA support?

But isn’t that function for cases where B is a vector instead of a matrix?

You can compute the column vectors independently, can you not?

If `A * x1 = b1`

and `A * x2 = b2`

for vectors `x1,x2,b1,b2`

, then `A * X = B`

for matrices with two columns `X=(x1,x2), B=(b1,b2)`

I mean I could… but the problem is that there would be 7000 operations I would have to fire at once… I was wondering if that could cause significant overhead. Would it not? I’m sorry if these questions are really beginner level I am new to CUDA

You could just start implementing it and see if the performance is good enough for you. I have already suggested a batched variant above.