CUBLAS SGEMM Flops measurement using nvprof on Volta

sriramandare · June 6, 2018, 6:33am

Measurement of flops using nvprof for SGEMM 8Kx8Kx8K

steps:

Run CUBLAS based SGEMM (ie mulitiply two random matrices of size 8Kx8K and 8Kx8K) on Volta.
Set operating frequency of Volta to 1200 MHz.
Measure time using nvprof print-gpu-trace option.(87mS)
Measure number of instructions executed using --flop_count_sp metric
compute Actual flops = operations/Time ===>(12561 GLOPS)
Theoretical Flops of Volta = 5120(number of cuda cores) * 1200 (frequency) * 2 ==>(12288 GLOPS)

Problem:
Measured flops is greater than Theoretical flops.(102%).
This issue is seen only for bigger matrix sizes, seems to be less than 100% for smaller matrix sizes.

Any idea what could explain this behaviour?

Topic		Replies	Views
Low performance on SGEMV CUDA Programming and Performance	2	2296	June 22, 2007
Slow CUDA SGEMM CUDA Programming and Performance	5	757	September 15, 2022
Metrics divergence on sgemm vs matrixMul Visual Profiler and nvprof	0	701	January 23, 2020
dense matrix-vector numbers CUDA Programming and Performance	3	831	July 16, 2010
SGEMM and SGEMV - large performance difference in cuBLAS CUDA Programming and Performance	1	518	April 7, 2024
Computing GFLOPs CUDA Programming and Performance	1	8180	December 23, 2009
cublas sgemm,dgemm performance issue on telsa 10 and gtx 570 GPU-Accelerated Libraries	1	1325	February 24, 2013
Roofline Model for Nvidia GTX1080 Announcements	0	1381	September 18, 2018
Reasonable timing with Cublas dgemm and sgemm CUDA Programming and Performance	15	4447	January 14, 2010
cublasSgemv slower than expected GPU-Accelerated Libraries	7	1084	December 22, 2020

CUBLAS SGEMM Flops measurement using nvprof on Volta

Regards,

Related topics