I need to batched solve systems of linear equations where,
A is 16x162 single precision real matrix in Ax = b form.
As you can see the size of the A matrix is ~11kB which fits with
shared memory. I need to solve about 512 to 1024 of this kind of systems simultaneously (or as a batch).
As I understand, CUDA batched solver which implemented for double precision can’t
solve this problem. The dimensions are also smaller (i.e., less than 162) as it perform double precision.
- Is there way to use CUDA batched solver to solve above case?
- Are their any other CUDA sample solvers available without using libraries?
- What should be the best approach to solve above case?