Hi, I’m trying to measure the aggregate throughput between SMs and L1 cache/SMEM when running my code. Initially, I thought
gld_throughput is the metric what I was looking for, but
gld_throughput doesn’t seem to cover local, texture and shared memory loads.
So I’m now using the sum of
shared_load_throughput. Is this method sound?