Hi,

I need to batched solve systems of linear equations where,

A is 16x162 single precision real matrix in Ax = b form.

As you can see the size of the A matrix is ~11kB which fits with

shared memory. I need to solve about 512 to 1024 of this kind of systems simultaneously (or as a batch).

As I understand, CUDA batched solver which implemented for double precision can’t

solve this problem. The dimensions are also smaller (i.e., less than 162) as it perform double precision.

Therefore,

- Is there way to use CUDA batched solver to solve above case?
- Are their any other CUDA sample solvers available without using libraries?
- What should be the best approach to solve above case?

Thank you.