I’m curious as written in the title.
Many methods I googled online focus on B being a vector, or A being a dense matrix. And the functions that sound like a good candidate in the API documents require A to be a square matrix… I want to solve this linear system when A is a huge rectangular matrix and B is a huge square matrix (like each dimension being at least bigger than 7000). Is it possible with the current CUDA support?
But isn’t that function for cases where B is a vector instead of a matrix?
You can compute the column vectors independently, can you not?
If A * x1 = b1
and A * x2 = b2
for vectors x1,x2,b1,b2
, then A * X = B
for matrices with two columns X=(x1,x2), B=(b1,b2)
I mean I could… but the problem is that there would be 7000 operations I would have to fire at once… I was wondering if that could cause significant overhead. Would it not? I’m sorry if these questions are really beginner level I am new to CUDA
You could just start implementing it and see if the performance is good enough for you. I have already suggested a batched variant above.