How to perform GEMM using CUDNN?

MangoDalDalRaccoon · September 3, 2021, 7:06am

I know CUDNN supports the convolution using GEMM (data rearrange needed though), but is there any way to perform GEMM directly using CUDNN?
Of course, there are cuBLAS or cuTLASS for GEMM, but still want to perform GEMM using CUDNN API.

AakankshaS · September 13, 2021, 9:40am

Hi @MangoDalDalRaccoon ,
Do you mean to perform matrix multiplication using cuDNN ?
cuDNN will call GEMM kernel but that is to do convolution as you already mentioned.
Can you please elaborate on the ask.

Thanks!

MangoDalDalRaccoon · September 14, 2021, 1:11am

Yes thanks for replying.
I mean just gemm. For example, multiply two matrix with shape
a = 2x3
b = 4x3
with matrix b transposed.
I understand that cudnnConvFwd is the wrapper of the optimized gemm function, and I wonder how to call the gemm function directly for my matrix multiplication?

AakankshaS · September 28, 2021, 1:42pm

Hi @MangoDalDalRaccoon ,
There are a few options which you may try here:

Use a matmul op using the backend API (see the backend API docs for more details)
Use a matmul op using the cudnn cpp frontend (see run_matmul_bias_gelu in fusion_sample.cpp for an example.
Transform the GEMM problem into a 1x1 convolution, and call cudnn convolution, either through the legacy cudnn API or the backend or frontend API.

Thanks!

mazhaojia123 · February 18, 2023, 9:20am

Hi @AakankshaS,

I am confused about how to write a matmul using cudnn too.

I cannot find any code samples about how to use the backend API to write a matmul. I felt that it is difficult to put descriptors together according to the documents. So could you please provide some materials about using the backend api to write a matmul-like operator?
I write a matmul with the frontend api but I got the message: “Fusion with float inputs is only supported on Ampere or later”. I want to run these codes on the v100 GPU. Is there a way to solve the problem?

Generally, all I want is running a matmul with cudnn on v100. But I have problems on both the frontend and backend api. So I am looking forward to some help.

Thanks!!!

Topic		Replies	Views
How does CuDNN-Backend support Batch Gemm Array? cuDNN	2	765	June 30, 2022
Errors Occurred When Using CUDNN Matmul on V100 cuDNN cudnn	2	780	May 8, 2024
how to using optimized general matrix multiplication(GEMM) & memory re-ordering operations such as im2col Jetson TX2	1	731	April 12, 2018
Calling cgemm functions cuDNN	2	1768	February 12, 2020
Fully connected layer using cuDNN library cuDNN	3	3171	February 4, 2023
cuDNN vs cuBLAS performance on GEMMs GPU-Accelerated Libraries performance , cudnn , cublas , benchmarks	0	150	June 19, 2025
CUTLASS: Fast Linear Algebra in CUDA C++ Technical Blog	0	469	August 21, 2022
Why is 2-D convolution slower than the matrix product? CUDA Programming and Performance	17	7067	April 18, 2015
Matrix Multiplication -> PointWise Operation is Always Read as an MHA Pattern cuDNN cudnn	2	150	January 16, 2025
Convolutions in cuDNN cuDNN	1	832	July 8, 2018

How to perform GEMM using CUDNN?

Related topics