cublas large matrix multiplication large matrices won't compute

asteyc · January 16, 2008, 10:00pm

Greetings,

This question is for anyone familiar with cublas. I am currently running in emulation mode and I’m trying to perform the following matrix computation.

A is a 1 x 15,000,000 vector (yeah yeah, 15 million…)

B is an 15,000,000 by 16 matrix

trying to compute product of that, which should be a 1 x 16 vector using sgemm.

I have the following call:

cublasSgemm( 't','n', m, n, k, alpha, vec, k, mat, k, beta, c, m);

with m = 1, n = 16; k = 15,000,000;

The program goes into the matrix multiply with no problem (no errors reported) but just sits (I’m guessing forever…it’ll easly sit and “compute” for 30 minutes)

Now I think the problem could be the fact that I’m trying to allocate well over a gig of memory on the cpu (remember, running in emulation mode) and since I don’t have nearly that much ram, the paging is causing the computation to run very very slowly.

If that’s not the problem does anyone have any suggestions?

pleventi · January 16, 2008, 10:42pm

Try sweeping the size of your vector. Plot performance vs. vector length. Shape of curve and location of kinks/drop-offs tells you a lot about what’s going on.

vvolkov · January 17, 2008, 12:52am

take also in account that cublasSgemm may pad input matrices to sizes that are multiple of 32. That increases the total flop count in your case substantially.

Another factor is that cublasSgemm is a multithreaded code that calls barrier frequently (2*15 000 000/32 = million times in your case).

asteyc · January 17, 2008, 2:07pm

Thanks for the replies.

I shrank down the data set by a few orders of magnitude and things are definitely moving a bit faster. It is interesting to note that the multiplication of the 15,000,000 matrix did complete (though it took about an hour)

I assume this will all go much faster when I stop running in emulation mode and actually run on the card.

DenisR · January 17, 2008, 2:27pm

Well running in emulation mode is offcourse MUCH slower…

Topic		Replies	Views
Multiply large matrices with cublasSgemm CUDA Programming and Performance	8	1661	April 12, 2017
CGEMM problems CUDA Programming and Performance	14	6718	February 2, 2011
Matrix multiplication woes large inner, small outer dimensions CUDA Programming and Performance	21	10240	March 24, 2009
CUBLAS - low performance on matrix multiplication CUDA Programming and Performance	7	18261	March 30, 2011
Cublas, cublasSgemv Matrix vector operation size Limitation CUDA Programming and Performance	2	10388	August 14, 2008
CUBLAS terrible timings sgemm timing is very bad CUDA Programming and Performance	2	2402	January 22, 2008
Matrix Multiplication by cublasSgemm CUDA Programming and Performance	1	7536	March 26, 2010
a cublas problem CUDA Programming and Performance	4	3502	August 3, 2011
Matlab mex file using cublas - problems CUDA Programming and Performance	13	9062	October 13, 2009
cublasDgemm returns wrong results for large matrix dimensions? CUDA Programming and Performance	12	3275	November 30, 2010

cublas large matrix multiplication large matrices won't compute

Related topics