I am looking for the most efficient way to compute the matrix-vector product for a large number of small matrix-vector pairs. I am familiar with the BLAS routines for matrix-vector multiplication. However, because I have many small matrices to multiply by small vectors, I am trying to avoid launching a kernel for each individual product. Is there a way to use BLAS to compute many matrix-vector products with one call? Alternatively, is there a reference out there describing the best way to organize this problem?
Thanks
Matt Bakalar
I am looking for the most efficient way to compute the matrix-vector product for a large number of small matrix-vector pairs. I am familiar with the BLAS routines for matrix-vector multiplication. However, because I have many small matrices to multiply by small vectors, I am trying to avoid launching a kernel for each individual product. Is there a way to use BLAS to compute many matrix-vector products with one call? Alternatively, is there a reference out there describing the best way to organize this problem?
Thanks
Matt Bakalar
you can look at OPLib
http://www.level3finance.com/oplib.html
The author Claudio Albanese proposes BLAS4 operation which is useful when matrices and vectors are small.
you can look at OPLib
http://www.level3finance.com/oplib.html
The author Claudio Albanese proposes BLAS4 operation which is useful when matrices and vectors are small.