NVIDIA Developer Forums

cublas: Using dgemv

Accelerated Computing CUDA CUDA Programming and Performance

sbutler September 15, 2010, 3:07pm 1

I am new to CUDA and to cublas. I have a question:

I simply want to perform a matrix-vector mutliply on a general double precision matrix-vector. (My GPU is compute capability 1.3 so it can do double precision.)

I noticed there is no function simply for a matrix-vector multiply. The nearest match is dgemv, which is: r = alpha * A * x + beta * y.

Obviously, I can simply set alpha = 1.0 and beta = 0.0 to get the same behavior. But this leads me to a twofold question:

(1) Is the library smart enough to know that since alpha = 1 and beta = 0 it should not perform all of those extra multiplies and adds? (It seems like using C++ templates could help out here…)

(2) Do I need to allocate space on the GPU for a “dummy” variable y, or can I simply pass NULL for y without causing major issues?

(Or perhaps there is a different function I should be using instead?)

avidday September 15, 2010, 3:21pm 2

I think it is, but for reasonable size problems, it probably doesn’t make a great deal of different. The operation count of gemv is notionally 2MN, the additional constants only add another 2M. You might expect 2MN >> 2M for anything other than trivially small cases, so the overall effect on computation time is probably not all that large.

Passing NULL probably won’t work, but passing the vector twice with beta = 0 should be safe.

avidday September 15, 2010, 3:21pm 3

I think it is, but for reasonable size problems, it probably doesn’t make a great deal of different. The operation count of gemv is notionally 2MN, the additional constants only add another 2M. You might expect 2MN >> 2M for anything other than trivially small cases, so the overall effect on computation time is probably not all that large.

Passing NULL probably won’t work, but passing the vector twice with beta = 0 should be safe.

sbutler September 15, 2010, 3:40pm 4

It turns out I had read the documentation wrong the first time, oops…hadn’t noticed it is actually: y = alpha * A * x + beta * y (not r = alpha * A * x + beta * y).

In my particular application, M >> N so the O(MN) operation is approximately O(M) with M large (several thousand) but N small (less than 10…generally 2 or 3); adding more O(M) work would significantly impact runtime, so if someone could answer #1 definitively I would greatly appreciate it.

sbutler September 15, 2010, 3:40pm 5

It turns out I had read the documentation wrong the first time, oops…hadn’t noticed it is actually: y = alpha * A * x + beta * y (not r = alpha * A * x + beta * y).

In my particular application, M >> N so the O(MN) operation is approximately O(M) with M large (several thousand) but N small (less than 10…generally 2 or 3); adding more O(M) work would significantly impact runtime, so if someone could answer #1 definitively I would greatly appreciate it.

Topic		Replies	Views	Activity
Matrix Vector multiply CUBLAS function CUDA Programming and Performance	4	1674	March 5, 2010
cublasSgemv slower than expected GPU-Accelerated Libraries	7	1082	December 22, 2020
CUBLAS matrix-vector multiplication CUDA Programming and Performance	14	10274	January 20, 2010
CUDA stand-alone version of dense matrix-vector multiplication CUDA Programming and Performance	4	1140	May 4, 2022
Matrix-Vector Multiply with cublasDgemv CUDA Programming and Performance	4	3183	January 2, 2010
cublasSgemv() returning not expected values CUDA Programming and Performance	1	3159	December 1, 2009
Cublas, cublasSgemv Matrix vector operation size Limitation CUDA Programming and Performance	2	10402	August 14, 2008
FLOPS calculation in cublasDgemm CUDA Programming and Performance	4	2454	October 20, 2011
cublas matrix-vector problem CUDA Programming and Performance	1	3097	May 15, 2009
CUBLAS cublasCgemm use CUDA Programming and Performance	0	1184	September 7, 2009