cublas sgemm benchmarks

ribbldibbl · July 7, 2008, 12:48pm

Hi,

I’m trying to optimize a program where sgemm and strsv calls are the bottleneck, by fiddling with the block sizes. I’m particulary interested in:

comparison between cublas 1.1 and cublas 2.0 versions
performance for different (also non-quadratic) matrix sizes
comparison of different devices

I’d appreciate if you could tell me where to find some substantiated benchmarks or under which conditions sgemm performs optimally.

Regards,
M

vvolkov · July 9, 2008, 3:09pm

I believe sgemm in 2.0 runs best in cases AB and AB^T when height of A is multple of 64 and other dimensions are a multiple of 16. Check out the sgemm source code posted at http://forums.nvidia.com/index.php?showtop…14&#entry314014, it includes timing.

Topic		Replies	Views
Performance query Odd results profiling GPU speed of matrix multiplication using cublas CUDA Programming and Performance	1	1474	February 12, 2010
CUBLAS Configuration The use of CUBLAS for small matrix CUDA Programming and Performance	3	3752	April 4, 2007
What's the best matrix size for cublasSgemm performance ? GPU-Accelerated Libraries	1	1658	February 16, 2017
Reasonable timing with Cublas dgemm and sgemm CUDA Programming and Performance	15	4316	January 14, 2010
CUBLAS SGEMM on highly rectangular matrices CUDA Programming and Performance	1	3249	February 20, 2010
CUBLAS terrible timings sgemm timing is very bad CUDA Programming and Performance	2	2383	January 22, 2008
cuBLAS sgemm is slow CUDA Programming and Performance	4	2506	June 26, 2017
Cuda SGEMM same speed as APPLE veclibs ? CUDA Programming and Performance	8	10652	May 8, 2008
CUBLAS sgemv slower than CBLAS for small matrix sizes CUDA Programming and Performance	2	1530	February 1, 2010
my speedy SGEMM CUDA Programming and Performance	91	275994	May 29, 2013

cublas sgemm benchmarks

Related topics