Batched operations are designed for the efficient handling of multiple small matrices. A single small matrix only uses a fraction of the available computational capability of the GPU. By working on multiple matrices at once, one can utitlize all of the computational ability. Batched operations work best when there are thousands of small matrices, but still perform much better than dealing with matrices individually when there are 500 matrices. As far as I am aware, existing batched codes require all matrices in the batch to have the same size (this is not a restriction for typical real-world cases scenarios).
For the batched operations supported by CUBLAS, please consult the CUBLAS documentation (comes with your CUDA installation). For the source code of the batched dense solver and matrix inversion, please log into the registered developer website and download from there. To login (or register as a developer), please go to
Look for “CUDA Batch Solver” among the available downloads.