How to time cublas functions? cublasSgemv V.S nested loops

CUDA_Novice · June 26, 2009, 4:43pm

Hello,

I have a simple block of code that does a matrix-vector multiplication using the CublasSgemv function and a function that uses two nested for loops to do the same calculation that I wrote. The problem is that I am timing the two operations and the nested for loops is executing faster according to my timing method. This doesn’t seem right to me. I am using CudaEvents to time both operations. Is their something else that I should be doing to get a more accurate reading of the execution time.

Thanks in Advance!!!

Nico · June 26, 2009, 7:51pm

Did you warm up the device? The first kernel launch typically takes longer to complete and should not be included in the timing.

N.

CUDA_Novice · June 26, 2009, 8:09pm

No I haven’t done that… I will give it a try and see if it helps. Also are their optimal sizes to make your matrix and vector in order to yield optimal results from the GPU?

Nico · June 26, 2009, 10:26pm

I guess that depends on whether your matrices are stored in row-major or column-major format, but I believe using multiples of 32 (warp size) in both dimensions is optimal.

N.

avidday · June 26, 2009, 10:30pm

For non-trivially sized problems, I have found CUBLAS SGEMV is a lot faster than the best singled threaded host CPU SGEMV I have access to (I usually used GotoBLAS). And even multiples of 32 are a lot faster than odd sizes, so it does pay to pad storage out to multiples of 32.

Topic		Replies	Views
CUBLAS VS CBLAS sgemv Benchmarking matrix-vector operations on GPU and CPU CUDA Programming and Performance	5	10086	March 24, 2014
cublas problem: some blas 1 functions extremely slow! CUDA Programming and Performance	2	1627	November 24, 2009
CUBLAS terrible timings sgemm timing is very bad CUDA Programming and Performance	2	2379	January 22, 2008
CUBLAS sgemv slower than CBLAS for small matrix sizes CUDA Programming and Performance	2	1526	February 1, 2010
Evaluate cycle execution time Newbie question CUDA Programming and Performance	1	2174	July 13, 2007
CUBLAS matrix-vector multiplication CUDA Programming and Performance	14	10135	January 20, 2010
Help with CUBLAS performance and timing issues, please help... CUDA Programming and Performance	1	3459	December 26, 2008
Time Measurement for CUBLAS why time (clock()) for CUBLAS is always 0 ms for any array size? CUDA Programming and Performance	2	2647	March 21, 2009
Reasonable timing with Cublas dgemm and sgemm CUDA Programming and Performance	15	4310	January 14, 2010
Odd timing results Intel MKL vs. My GPU implementation CUDA Programming and Performance	5	3583	July 24, 2008

How to time cublas functions? cublasSgemv V.S nested loops

Related topics