cublasDgemm perfomance on TK1

xia · July 20, 2015, 7:06am

Hi,

I have run a matrixMul test using cublasSgemm and cublasDgemm on TK1 (Unified Memory).
C = A^T * A, matrix A size(3200,320): column major, matrix C size(320, 320): column major.

Unified Memory is allocated before the fucntion.

1)cublasSgemm : 8.1489 ms
2)cublasDgemm : 56.7583ms

first question: The time elpased is correct?
second question: Why double precision is ~7x slower than single precision?

cublasDgemm function

void cublas_Dgemm_unified(MatrixXd& matA, MatrixXd& matC, int K, int N)
{
	memcpy(monoA, matA.data(), K*N*sizeof(double));
	
	// gemm
	double alpha = 1.0f;
	double beta = 0.0f;
	cublas_safe_call( cublasDgemm(handle, CUBLAS_OP_T, CUBLAS_OP_N, N, N, K, 
		&alpha, monoA, K, monoA, K, &beta, monoC, N) );
	cudaDeviceSynchronize();
	
	// transfer
	memcpy(matC.data(), monoC, N*N*sizeof(double));
}

Nicholas_762 · July 21, 2015, 2:52am

…

xia · July 21, 2015, 10:11am

Thanks!

Why is double precision version ~7x slower than single precision?
That’s too slow!

Nicholas_762 · July 21, 2015, 2:36pm

…

Topic		Replies	Views
Is it correct that my Pascal card is calling Maxwell_gemm kernels through cublas? And if so, why is cublas unusably slow for me? CUDA Programming and Performance	6	1030	August 23, 2018
cublas sgemm,dgemm performance issue on telsa 10 and gtx 570 GPU-Accelerated Libraries	1	1325	February 24, 2013
Matrix matrix multiplication with CUBLAS on Geforce GTX 480 CUDA Programming and Performance	5	1766	October 5, 2010
Why is cuBLAS cublasDgemm slower than my naive GEMM kernel? GPU-Accelerated Libraries cuda , kernel , cublas , cutlass	1	99	September 15, 2025
Significant difference in results between MKL-BLAS & CUBLAS different results in Cgemm CUDA Programming and Performance	9	5130	August 31, 2009
Strange Variations in Execution Time of cublas<t>geam() [cublasDgemm] GPU-Accelerated Libraries cublas	9	1096	September 2, 2021
Slow CUDA SGEMM CUDA Programming and Performance	5	760	September 15, 2022
cublasDdgmm vs. cublasSdgmm GPU-Accelerated Libraries cublas	2	105	January 7, 2025
Performance query Odd results profiling GPU speed of matrix multiplication using cublas CUDA Programming and Performance	1	1501	February 12, 2010
my speedy SGEMM CUDA Programming and Performance	91	276727	May 29, 2013

cublasDgemm perfomance on TK1

Related topics