Generic DGEMM implementation

gemini0x4d · February 11, 2009, 10:34am

Hi,

I just startet with CUDA and played around with CUBLAS. For my work, I need a generic (good performing External Image ) DGEMM implementation which I can modify, so I need the source code which is not possible for CUBLAS…

Do you know any “open” CUDA-DGEMM implementation I could look into? If not, do you have any tips for me, how to implement an efficient DGEMM?

:">

Best regards,

gemini

zhenyu · February 11, 2009, 10:41am

Source code of CUBLAS is available:
[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA

E.D_Riedijk · February 11, 2009, 1:09pm

search for the posts of vvolkov, he has made the implementation in the later cublas version. it is for sgemm, but then you can rewrite it.

Topic		Replies	Views
HELP FOR SOURCE FILE OF CUBLAS cuda cublas source CUDA Programming and Performance	4	5626	January 23, 2010
Matrix Multiplication CUDA Programming and Performance	2	1287	May 14, 2009
Calling cgemm functions cuDNN	3	1740	October 12, 2021
CUBLAS Source code for CUBLAS functions CUDA Programming and Performance	0	1792	October 10, 2011
Fastest cuda BLAS GEMM GPU-Accelerated Libraries	2	1962	September 21, 2016
CUBLAS library code CUDA Programming and Performance	3	1594	December 23, 2010
Where is cute's gemm code? CUDA Programming and Performance	20	2644	October 13, 2024
DP Cublas functions Legacy PGI Compilers	1	3159	October 25, 2013
Slow CUDA SGEMM CUDA Programming and Performance	5	752	September 15, 2022
Why is cuBLAS cublasDgemm slower than my naive GEMM kernel? GPU-Accelerated Libraries cuda , kernel , cublas , cutlass	1	70	September 15, 2025

Generic DGEMM implementation

Related topics