Hello, I’m trying to use ncu to benchmark some applications for their performance regarding the usage of Tensor Cores (the devices I’m using are a 3080 and a A100). For simple scenarios where I’m performing matrix multiplication with known values for M,N and K, I can calculate the # of FLOPs from …

Lots of further links: I had once a similar issue, but without solution: [image] Wrong pipe utilization for Tensor (FP)? Nsight Compute On my RTX 2060. When I call “mma.sync.aligned.m16n8k8.row.col.f16.f16.f16.f16” 3.200.000x(32 Threads/Warp) in a loop on one SM, I get a…

How to measure Tensor FLOPs?

Accelerated Computing CUDA CUDA Programming and Performance

Curefab May 15, 2024, 2:44pm 13

Lots of further links:

I had once a similar issue, but without solution:

And this question is also related:

There was an issue (once?) with stored roofline of Tensor Cores for different GPU architectures (basically rooflines have been only correct for GV100 Volta GPUs):

Those detailed counters could help you calculate exact FLOPs (but only if you know some details of your instructions; it is not enough for fully unknown code to deduce FLOPs):

Topic		Replies	Views
Why Low Tensor Pipe Utilization CUDA Programming and Performance cuda , kernel	4	1257	May 20, 2022
NSight : How to calculate FLOP/s that's close to achieved FLOP/s CUDA Programming and Performance	3	3101	October 4, 2017
How to measure FLOPs of a cuda kernel function by using Nsight-Compute on A100 GPU? Nsight Compute kernel	2	905	August 16, 2024
Why the number of flops is different between FP32 and FP16 mode with YOLOv3 TensorRT implementation? Jetson AGX Xavier tensorrt , kernel , profiling	8	4003	October 18, 2021
Nsight Compute-Roofline chart Nsight Compute	12	1528	September 20, 2024
Roofline Tensor Core should be half but not float? Nsight Compute	3	1364	May 29, 2024
Question about Roofline of TensorCore GEMM Nsight Compute	3	1505	August 7, 2024
I need help understanding how concurrency of CUDA Cores and Tensor Cores works between Turing and Ampere/Ada? CUDA Programming and Performance cuda , tensorflow , rtx , ampere	10	1839	September 27, 2024
Question regarding Tensor Cores/GV100 CUDA Programming and Performance	8	2546	August 12, 2017
Differences in Precision Between Tensor Cores and CUDA Cores CUDA Programming and Performance cuda	1	113	January 10, 2025

How to measure Tensor FLOPs?

Related topics