Why Low Tensor Pipe Utilization

wpybtw1 · May 20, 2022, 12:17pm

Hi, I am using CUDA11.6 and NCU 2022.1.1.
And I am looking at “sm__inst_executed_pipe_tensor_op_.hmma.avg.pct_of_peak_sustained_active”.

It compiled with “nvcc wmma.cu --expt-relaxed-constexpr -gencode=arch=compute_75,code="sm_75,compute_75" -o wmma” and run on a 2080Ti. The launch parameter is <<<10000, 256>>>.

I got 24.6% for

#pragma unroll
  for (int i = 0; i < 200; i++) {
    wmma::fragment<wmma::matrix_b, 16, 16, 16, __half, wmma::col_major> b_frag;
    wmma::load_matrix_sync(b_frag, B, 16);
    wmma::mma_sync(acc_frag, a_frag, b_frag, acc_frag);
  }

I got 50% for

  wmma::fragment<wmma::matrix_b, 16, 16, 16, __half, wmma::col_major> b_frag;
  wmma::load_matrix_sync(b_frag, B, 16);
#pragma unroll
  for (int i = 0; i < 200; i++) {
    wmma::mma_sync(acc_frag, a_frag, b_frag, acc_frag);
  }

Topic		Replies	Views
How to measure Tensor FLOPs? CUDA Programming and Performance tensorrt , cuda , kernel	14	2456	May 15, 2024
Can you use nsight to see tensor core occupancy? Nsight Compute cudnn	4	1004	March 23, 2024
How to use WMMA efficiently CUDA Programming and Performance	4	7920	October 23, 2020
Maximum Tensor Core utilization Nsight Compute cuda , kernel	4	143	March 20, 2025
Tensor metrics in NsightCompute Nsight Compute	2	914	October 10, 2019
How to verify that Tensorflow w/ AMP is using tensor cores Nsight Compute	4	1669	June 6, 2019
Wrong pipe utilization for Tensor (FP)? Nsight Compute	0	665	November 6, 2021
How can I get the utilization of cuda core and tensor core respectively? Profiling Linux Targets	5	3146	January 10, 2023
Why Windows Task Manager reports 100% of GPU while loading one core only? CUDA Programming and Performance cuda , kernel	10	70	May 6, 2025
Nsight Compute-Roofline chart Nsight Compute	12	1482	September 20, 2024

Why Low Tensor Pipe Utilization

Related topics