The batched solver code downloadable from the CUDA registered developer website was a precursor to the batch support added to CUBLAS. The many downloads of that original code motivated productization of batched interfaces. Unless programmers have a need for source code, e.g. to add custom functionality, it should not be used any more at this time.
You seem to be using single precision data. Note that there is no single-precision solver in the downloadable package, i.e., no function ssolve_batch(). You could make your own ssolve_batch() function following the pattern of the existing double-precision solver and it should be able to handle matrices up to about dimension 108x108 on a K40.
I don’t recall what exact performance expectations one should have for a K40, but 30 GFLOPS does not seem totally out of line. By their nature, solvers contain both sequential and parallel operations, they are not fully parallel like a matrix multiply. The general issue when processing small matrices is that they do not expose enough parallelism to keep the GPU busy, which needs to run on the order of 10,000 threads for good performance.
One can introduce batch processing to get around it, but this then takes many registers or lots of shared memory to make it fast. Around a matrix size of 10x10 one runs out of registers, around size 100x100 one runs out of shared memory (e.g. 76x76 when the matrix elements are ‘double’).
There is a gap between that size and the minimum size that can be handled efficiently by non-batched operations (around 256x256, I think, but my memory is hazy). The gap can be filled by various “hybrid” methods as best one can, but the performance will not necessarily great.
If this is an important use case (i.e. significant impact on overall application run time), you may want to file an RFE (request for enhancement) with NVIDIA to further optimize the batched APIs for this range of matrix sizes. RFEs can be filed via the bug reporting form, simply prefix the synopsis with “RFE:” to mark it as such.