Now I’m trying to analysis the performance counter collected by ncu of a GeMM task. And I encounter a counter:
lts__t_sectors_srcunit_ltcfabric_evict_normal_lookup_miss, I look up “ltcfabric” and “srcunit” in the document (Kernel Profiling Guide :: Nsight Compute Documentation), and there’s even no apperance of these two words. I wonder what’s the meaning of this counter.
The ltcfabric is the communication fabric for the L2 partitions that were introduced in Ampere A100. For more information see the “A100 L2 Cache” section in the white paper https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf That metric would be related to L2 accesses that access the other partition.
lts → L2 cache slice
t → T stage on L2 cache
sectors → sectors count(what you will collect)
srcunit → source hardware unit which send data to current hardware
ltcfabric → L2 cache communication
So, srcunit_ltcfabric means another L2 partition send data to current L2 partition, and this metric will collect related the number of sectors sent.
Thanks, that helps me alot.