Is to possible to speed up multiple matrix per vector multiplication using CUDA?

pitoko · April 5, 2010, 2:45pm

Hi, I don’t have big experience with CUDA programming, but i’m familiar with CUDA Programming Guide.

I’ve found some interesting (for me) algebra problem, which i suspect is not easy to speed up with CUDA, but i’m not sure.

The problem is:

How to speed up multiple matrix per vector multiplications? Matrix is always the same, vectors are differrent in each computation, and are dependent on the previous multiplication result.

So:

M - matrix

v1, v2, …, vn - vectors (only v1 is known at begin, if you want to use vn, you have first to compute vn-1)

Call patterns is:

CPU: Transfer M (matrix) to GPU

CPU: Compute v1

CPU: Call M * v1 multiplication

	GPU: Multiplicate M * v1 to r1

CPU: Basing on r1 and some algorithm (not important here) compute v2

CPU: Call M * v2 multiplication

   GPU: Multiplicate M * v2 to r2

CPU: Basing on r2 and some algorithm (not important here) compute v3

CPU: Call M * v3 multiplication

   GPU: Multiplicate M * v3 to r3

.

. (etc)

.

The matrix size is 256 x 256 and the vector 256 x 1.

If somebody is curious what this is for, i want to speed up neural networks computing (which main part can be view as a matrix per vector multiplication). I’ve to call neural network few times, but each input depends on previous network output.

avidday · April 5, 2010, 3:45pm

A 256x256 matrix-vector product is rather small for a GT200 or GF100 based gpu. It might be a little faster than the host CPU, but not spectacularly so.

The description of the solution sequence sounds very similar to the stages of a diagonally implicit Runge-Kutta method. If it is, it might possible to re-formulate it (depending on how many stages there are), as a single large block sparse linear system and solve it iteratively using a Newton method. That sort of problem might be much better suited to solving on the GPU.

MMB · April 12, 2010, 8:16pm

Hi, I don’t have big experience with CUDA programming, but i’m familiar with CUDA Programming Guide.

I’ve found some interesting (for me) algebra problem, which i suspect is not easy to speed up with CUDA, but i’m not sure.

The problem is:

How to speed up multiple matrix per vector multiplications? Matrix is always the same, vectors are differrent in each computation, and are dependent on the previous multiplication result.

So:

M - matrix

v1, v2, …, vn - vectors (only v1 is known at begin, if you want to use vn, you have first to compute vn-1)

Call patterns is:
CPU: Transfer M (matrix) to GPU

Can the "some algorithm" be implemented on the GPU?   If so, then you will have a nice solution!

MMB

CPU: Compute v1

CPU: Call M * v1 multiplication

	GPU: Multiplicate M * v1 to r1

CPU: Basing on r1 and some algorithm (not important here) compute v2

CPU: Call M * v2 multiplication

   GPU: Multiplicate M * v2 to r2

CPU: Basing on r2 and some algorithm (not important here) compute v3

CPU: Call M * v3 multiplication

   GPU: Multiplicate M * v3 to r3

.

. (etc)

.
The matrix size is 256 x 256 and the vector 256 x 1.

If somebody is curious what this is for, i want to speed up neural networks computing (which main part can be view as a matrix per vector multiplication). I’ve to call neural network few times, but each input depends on previous network output.

Topic		Replies	Views
Large matrix multiplication for neural network purpose CUDA Programming and Performance	1	797	October 6, 2016
Vector[1xN] * Matrix[NxM] How would you set it up ? CUDA Programming and Performance	3	4417	October 13, 2008
Advice on simple multiple matrix multiplications .... CUDA Programming and Performance	2	1270	April 4, 2010
Vector Multiplication CUDA Programming and Performance	1	1054	November 21, 2010
Vector-Matrix Multiplication is this a fast kernel? CUDA Programming and Performance	6	1880	April 15, 2010
Simple Matrix-Vector Multiplication CUDA Programming and Performance	7	6214	April 16, 2010
Vector-Matrix Multiplication Is this a fast kernel? CUDA Programming and Performance	5	6768	April 19, 2010
Pls help - Matrix multiplication CUDA Programming and Performance	0	736	February 9, 2011
Matrix by vector multiplication (row/column wise) CUDA Programming and Performance	0	633	March 17, 2020
Low speed MatrixVectorMult & Vector Sum on CUDA (x10 slower then CPU) GPU-Accelerated Libraries	0	680	May 2, 2018

Is to possible to speed up multiple matrix per vector multiplication using CUDA?

Related topics