Performance query Odd results profiling GPU speed of matrix multiplication using cublas

david_jones · February 12, 2010, 2:23pm

I’ve just been profiling cublas multiplying two matrices of random floats of increasing dimension, and got some curious results.
See attached for graph of GPU performance.
I’m curious about the step-cycle observed on both a GTX280 and a Tesla card.
I was wondering if others have seen this? Do you have any suggestions as to why?
Regards, David

avidday · February 12, 2010, 2:38pm

There are two different versions of sgemm() in cublas. One is considerably faster than the other. The faster one is used only when the matrix dimensions are nice round multiple of its execution parameters, the slower version otherwise. If you profile your benchmark application you can see the different kernels in the output. I don’t remember what I measured difference in performance for single precision to be, but for double, it is something like 15% difference between the two.

Topic		Replies	Views
Slow CUDA SGEMM CUDA Programming and Performance	5	753	September 15, 2022
Why is cuBLAS cublasDgemm slower than my naive GEMM kernel? GPU-Accelerated Libraries cuda , kernel , cublas , cutlass	1	86	September 15, 2025
Reasonable timing with Cublas dgemm and sgemm CUDA Programming and Performance	15	4433	January 14, 2010
CUBLAS VS CBLAS sgemv Benchmarking matrix-vector operations on GPU and CPU CUDA Programming and Performance	5	10149	March 24, 2014
Is it correct that my Pascal card is calling Maxwell_gemm kernels through cublas? And if so, why is cublas unusably slow for me? CUDA Programming and Performance	6	1025	August 23, 2018
CUBLAS sgemv slower than CBLAS for small matrix sizes CUDA Programming and Performance	2	1576	February 1, 2010
Why is my cublas so slow and is there anything I can do to fix it? CUDA Programming and Performance	1	1545	June 27, 2018
why matrixMul from samples so slow? CUDA Programming and Performance	7	5178	June 7, 2010
CUBLAS terrible timings sgemm timing is very bad CUDA Programming and Performance	2	2417	January 22, 2008
cublas problem with very big matrixes and cublasDgemm slow CUDA Programming and Performance	2	1062	February 23, 2017

Performance query Odd results profiling GPU speed of matrix multiplication using cublas

Related topics