CUDA batched solver ...


I need to batched solve systems of linear equations where,
A is 16x162 single precision real matrix in Ax = b form.

As you can see the size of the A matrix is ~11kB which fits with
shared memory. I need to solve about 512 to 1024 of this kind of systems simultaneously (or as a batch).

As I understand, CUDA batched solver which implemented for double precision can’t
solve this problem. The dimensions are also smaller (i.e., less than 162) as it perform double precision.


  1. Is there way to use CUDA batched solver to solve above case?
  2. Are their any other CUDA sample solvers available without using libraries?
  3. What should be the best approach to solve above case?

Thank you.

You can try CULA (paid 3rd party lib) or Magma (free).
It can solve all bunch of types: single, double, complex float, complex double.

Those libraries, however, don’t support streams or batching so you’ll have
to solve them one by one. a 16x162 might ~ fill the GPU so the batching or streaming
might only have a small advantage (though it could be usefull such as hiding memcopies etc…)