Tensorcore roofline

dhjoo982 · July 18, 2024, 6:07pm

I am profiling a single A100

Nsight Compute GUI version 2023.2.2.0 shows Floating Point Roofline for Double, Single, and Half Precision. (I assume for CUDA core)

Then, it also shows a ‘Floating Point Operations Roofline (Tensor Core)’
As it’s peak compute throughput shows 116 TFLOPS, this roofline seems to measure the TF32 performance. (156 TFLOPS on Spec Sheet)

Upon looking around the forum,
Roofline Tensor Core should be half but not float? - Developer Tools / Nsight Compute - NVIDIA Developer Forums suggests that the Tensor Core roofline metric supports only FP16 for GV100.

Question about Roofline of TensorCore GEMM - Developer Tools / Nsight Compute - NVIDIA Developer Forums suggests that the Tensor Core roofline only supports the GV100 architecture.

Then my questions are:

Is Nsight Compute showing the accurate roofline for A100 Tensor Cores?
If so, is it displaying the roofline in TF32. In which I should double the FLOPS if I am using FP16 dtype.

veraj · August 6, 2024, 10:14am

Hi, @dhjoo982

We recently released 2024.3.0 with more support for tensor code for all chips .
Please get the latest build to check.

veraj · August 27, 2024, 10:14am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question about Roofline of TensorCore GEMM Nsight Compute	3	1616	August 7, 2024
Roofline Tensor Core should be half but not float? Nsight Compute	3	1581	May 29, 2024
Discrepancy in Tensor Core FP16 Performance Ceiling on H100 SXM Observed in Nsight Compute Nsight Compute	2	301	December 31, 2024
I cant see roofline tensor core Nsight Compute	12	553	January 6, 2025
Different achieved values in Roofline Nsight Compute	3	669	June 8, 2023
IMMA roofline analysis in NSight Compute Nsight Compute	4	1290	August 17, 2023
Question regarding Tensor Cores/GV100 CUDA Programming and Performance	8	2718	August 12, 2017
Measuring T4 TensorCore Integer TOPS for roofline Nsight Compute	0	461	November 5, 2020
Question about tensor cores performance CUDA Programming and Performance	3	825	October 12, 2021
How to calculate the Tensor Core FP16 performance of H100? CUDA Programming and Performance	9	7742	August 14, 2024

Tensorcore roofline

Related topics