Measuring L1/SMEM throughput on V100 using nvprof

keichi.t · October 21, 2020, 12:16pm

Hi, I’m trying to measure the aggregate throughput between SMs and L1 cache/SMEM when running my code. Initially, I thought gld_throughput is the metric what I was looking for, but gld_throughput doesn’t seem to cover local, texture and shared memory loads.

So I’m now using the sum of gld_throughput, local_load_throughput, tex_cache_throughput and shared_load_throughput. Is this method sound?

Thank you.

mnicely · October 21, 2020, 2:42pm

You should really be using Nsight Compute to profile architecture Volta and newer. I’ve provided links below to help you get up and running.

What exactly are you trying to measure?

https://developer.nvidia.com/blog/using-nsight-compute-to-inspect-your-kernels/

https://docs.nvidia.com/cupti/Cupti/r_main.html#r_host_derived_metrics_api

keichi.t · October 22, 2020, 2:53am

Thanks, I will use Nsight Compute instead.

I am trying to make a hierarchical roofline plot to analyze my kernel and need to measure L1, L2 and HBM throughput. The paper below does the same analysis I’d like to achieve. Hope this is clear enough.

Robert_Crovella · October 22, 2020, 4:28am

nsight compute has a roofline capability built in

keichi.t · October 22, 2020, 4:33am

I tried but it’s only showing the HBM roofline. Is there a way to plot L1 and L2 rooflines?

Topic		Replies	Views
Nvprof and Nsight returning different results for L1 and L2 cache hit rates Nsight Compute	4	736	August 13, 2019
Nvprof and Nsight returning different results for L1 and L2 cache hit rates Visual Profiler and nvprof	0	867	July 8, 2019
Non-aggregated profiling result support Nsight Compute cuda , kernel	3	657	September 28, 2023
How do i get some of the nvprof metrics in insight? Nsight Compute	0	790	June 2, 2021
Is there any way to see SM activity/SM util on P100？ Nsight Compute	2	486	November 7, 2023
nsight-compute's profiling result is different from nvprof's Nsight Compute	4	741	April 9, 2019
Nsight and nvprof results have large differences Nsight Compute	9	1374	November 26, 2019
Why the Compute Throughput's value is different from the actual Performance / Peak Performance Nsight Compute cuda , kernel , nsight , profiling	9	3759	December 31, 2025
what is the mean of `gpu__compute_memory_access_throughput` Nsight Compute	4	1143	August 22, 2019
Can NVIDIA nsight compute profiler help me if I want to get the the L1/L2 hit rate for a specific lines of code in my kernel for the memory access? CUDA Programming and Performance	1	61	July 30, 2025

Measuring L1/SMEM throughput on V100 using nvprof

Related topics