dense matrix times vector

Hi all,

I am working on an MCMC simulation where a very large number (millions) of dense matrix times dense vector calculations are required. The Matrix and vector are also quite large (matrix is 25,000x6,000 and vector is 6,000 elements long).

I am now wondering what algorithm or library has the fastest multiplication implemented (I am hoping not having to implement myself). I saw a paper that suggests cuBLAS may not be the fastest way? Is that outdated?
http://ch.nvidia.com/docs/IO/47905/fujimoto_lspp2008.pdf

Any hints are appreciated.

Thanks, Jan

Hi Jan,

I don’t have any recent definitive data but would think cuBLAS is still the best option for very large arrays and vectors. Of course, the exact performance will depend on the GPU you’re using.

If you need to program your own computations, I would suggest looking at OpenACC as a way to port these routines over to a GPU.

  • Mat