Calling cgemm functions

mahmood.nt · February 11, 2020, 7:35am

I would like to know if it is possible to write a sample code that directly calls cgemm functions directly? For example, I want to analyze cgemm_64_32_tn with a sample input.

I haven’t seen a guide for that. Any help?

SunilJB · February 12, 2020, 4:20am

Hi,

In this case you need to use cuBLAS.
Please refer to below link for more details:
https://docs.nvidia.com/cuda/cublas/index.html

Thanks

mahmood.nt · February 12, 2020, 7:48pm

Thank you. You are right. I tried cublasSgemm and played with some inputs. For example, I see that profiling the multiplication of two FP squared matrices (1000x1000) and writing the results in a 1000x1000 have the following statistics.

Kernel: volta_sgemm_128x32_nn
1 dram_read_transactions Device Memory Read Transactions 586589 586589 586589
1 dram_write_transactions Device Memory Write Transactions 483562 483562 483562
1 flop_count_sp Floating Point Operations(Single Precision) 2157969408 2157969408 2157969408

On average (586589+483562)*32 or 34,244,832 bytes are read/written from/on DRAM.

With pencil and paper, we know each matrix contains 100010004 or 4,000,000 bytes. Two DRAM reads and one DRAM write yields us 12,000,000 bytes.

Some differences are acceptable since the exact implementation of volta_sgemm_128x32_nn is unknown.
However, one can say the data movement of volta_sgemm_128x32_nn means an inefficient algorithm.

Any comment on that?

Topic		Replies	Views
Using gcgemm from CuBLAS CUDA Programming and Performance	1	724	March 23, 2020
How does cublasGemmEx() call work with CUDA_R_16F inputs and CUDA_R_32F computeType CUDA Programming and Performance	3	1858	December 10, 2017
Compiling under CUDA 5.5 uses unnecessary global memory CUDA Programming and Performance	10	2032	August 13, 2013
lower limit of cuBLASSgemm GPU-Accelerated Libraries	2	515	July 15, 2016
Having multiple relatively small problems GPU-Accelerated Libraries cublas , cusolver	5	797	April 7, 2022
CGEMM problems CUDA Programming and Performance	14	6644	February 2, 2011
Callback function in cuBLAS CUDA Developer Tools	0	338	September 21, 2020
Excuse me, I would like to ask the following questions about the use of the cublasZgemmBatched function GPU-Accelerated Libraries cublas	1	405	June 26, 2023
cublasZgemmBatched low performance 2x2 matrices; how to increase performance? GPU-Accelerated Libraries	9	1306	February 20, 2015
cublasHgemm is slower than cublasSgemm in CUDA 11.1 when I use? GPU-Accelerated Libraries	2	509	December 1, 2020

Calling cgemm functions

Related topics