Many matrix-vector multiplications at one time

afai97202013 · October 20, 2013, 6:10am

I want to do matrix-vector multiplication.

I have 32 S(512*512) (matrix) and 32 A(512) (vector). I want to do 32 multiplications at one time

I use for loop.

for(k=0;k<32;k++){
     cublasZgemv(handle,CUBLAS_OP_N,512,512,&alpha,d_S+k*512*512,NRM,d_A+512*k,1,&belta,d_B+512*k,1);
}

It takes 8.99ms. I do cublasZgemv 32 times. That was stupid.

Another way I write my own kernel to do that.

__global__ void mv(cuDoubleComplex *S,cuDoubleComplex *A ,cuDoubleComplex *B, int n, int l)
{
     int i = blockIdx.x*blockDim.x + threadIdx.x;
     int j = blockIdx.y*blockDim.y + threadIdx.y;
     cuDoubleComplex Z;
     cuDoubleComplex Temp;
     for(int k = 0;k<n;k++){
          Temp = cuCmul(S[j*n*n+i*n+k],A[j*n+k]);
          Z = cuCadd(Z,Temp);
     }
     B[j*n+i] = Z;
}

It takes me 3.8ms. The summation is very slow. But that’s best I could do.

Both of them give me correct result. But I can’t satisfy that speed!!!.
Even using MKL + OpneMP is faster than that.

cublasZgemv is very fast but I want to do 32 cublasZgemv at one time.(or can parallel that)

pasoleatis · October 20, 2013, 8:40am

Would the streams work?

afai97202013 · October 21, 2013, 4:38am

Thank!!!
Streams work!!

It only takes 1.58ms!!!

Topic		Replies	Views
Hundreds of parallel matrix-vector multiplications with cuBLAS GPU-Accelerated Libraries	8	2329	April 8, 2021
Use streams to batch the execution of CUBLAS kernels. CUDA Programming and Performance	3	2208	June 15, 2011
cublasSgemv slower than expected GPU-Accelerated Libraries	7	1008	December 22, 2020
Tiled cublas gemm on multiple GPUs CUDA Programming and Performance	2	1314	March 31, 2013
Matrix Vector multiply CUBLAS function CUDA Programming and Performance	4	1633	March 5, 2010
CUDA stand-alone version of dense matrix-vector multiplication CUDA Programming and Performance	4	1065	May 4, 2022
multiple small (symmetric) matrix -vector multiplications CUDA Programming and Performance	2	763	May 14, 2012
Newbie question about cublas CUDA Programming and Performance	10	3366	December 2, 2010
CUBLAS matrix-vector multiplication CUDA Programming and Performance	14	10117	January 20, 2010
Vector-Matrix Multiplication Is this a fast kernel? CUDA Programming and Performance	5	6677	April 19, 2010

Many matrix-vector multiplications at one time

Related topics