Matrix-Vector Multiply with cublasDgemv

dmyablonski · January 2, 2010, 7:35pm

Simple question…

I have a 10x10 matrix A, and a 10x1 matrix D. I want to do AD. I do AD of these matrices in matlab and in a CPU version of code already in the software I’m writing, and they get identical results. I’m not writing any kernels yet so I was trying to just use CUBLAS for everything, and I have everything working but this function.

I have this call in my source:
if(!GPUPLA.gemv(D,A,TEMP,1)) { algo_->error(“GEMV error.”); return 0;}

Which is just to abstract the actual cuda calls to another file. The function called is here:
bool
GPULinearAlgebra::gemv(GPUVector& b, GPUMatrix& a, GPUVector& r, double n)
{
cublasDgemv(‘N’, a.rows_, a.columns_, n, a.data_, a.rows_, b.data_, 1, 1, r.data_, 1);
if(!check_error(cublasGetError())) { printf(“GEMV Failed\n”); return false; }
return true;
}

The TEMP matrix is 10x1 and already allocated with just 0’s. The way I understood it from the CUBLAS library PDF, I thought this should work.

I’m getting the wrong answers as is. I’ve verified D and A have the correct values before the call by pulling them off the GPU and printing them.

Thanks in advance.

avidday · January 2, 2010, 8:28pm

On the basis of what you posted it is hard to say. Presuming you have the memory management and copying side of things correct (you haven’t shown any code to it is impossible to say one way or the other), the obvious place people often go wrong with CUBLAS is passing row major ordered arrays. CUBLAS is a FORTRAN ordered BLAS, not C ordered, so input matrices need to be written in column major order.

dmyablonski · January 2, 2010, 9:15pm

I had read about that and thought I was ok, but now that I think about it, you are probabaly right. I assumed I was safe since my dot products were working fine. Now that I’m looking at it when I print it out, it is in row major. I didn’t notice it previously probably because the other data sets I were using had A matrices that were symmetrical.

This is unfortunate because these matrices are imported in a different area of the program that I am not touching, so I would lose some speed by needing to translate the matrices.

If I use the translate option in gemv to make it A transpose, that would essentially be the same as going from row major to column major for this simple example, correct? Unless translating it before I store the matrices would be faster. I’m unaware of the time comprimise for gemv if you have it use the transpose functionality.

avidday · January 2, 2010, 9:21pm

Unfortunately not necessarily. Depending on how you are allocating the GPU memory, there can be padding/alignment rows added to the storage for memory access performance reasons. So theoretically you are correct, you could just use the transpose and it should work. But you should be very sure of the memory layout of your matrices in the GPU before you do so.

dmyablonski · January 2, 2010, 11:37pm

I’m just using the basic cublas add vector and add matrix functions. I’ll try to verify though. I’m pretty positive the values are in row major in host memory before i send them to device, and when I pull them out. I’m guessing when they are on the device they aren’t truly how I thought they were.

Thanks for the help.

Topic		Replies	Views
cublasSgemv ordering problems? CUDA Programming and Performance	0	767	April 24, 2015
Having trouble using cublas_dgemm with row major matrix [solved] GPU-Accelerated Libraries	1	955	November 23, 2016
Row major matrix to Row major matrix multiplication in cublas GPU-Accelerated Libraries	1	5675	September 24, 2014
A newbie question on cublasSgemm CUDA Programming and Performance	6	4979	May 14, 2008
How to use cublasSgemm ? CUDA Programming and Performance	3	14320	November 16, 2018
Alternatives to cublas matrix multiplication (row major) CUDA Programming and Performance	9	17516	April 7, 2010
How to use CUBLAS in C ? CUDA Programming and Performance	1	1010	August 1, 2011
cublasSetMatrix() and cublasGetMatrix() for dealing with matrices stored in row major order GPU-Accelerated Libraries	4	6375	September 10, 2015
small problem with cublas sgemv CUDA Programming and Performance	4	4201	February 17, 2012
Help in using CUBLAS CUDA Programming and Performance	2	2185	January 29, 2012

Matrix-Vector Multiply with cublasDgemv

Related topics