Meanings of L2 --> L2 copy

harlan.zhang · January 17, 2022, 3:52am

Hi,

Every blocks process some global data and no repeated load.
But there are 16.93M L2 to L2 copy. What’s the reason for that?

felix_dt · January 17, 2022, 7:33am

You can refer to the A100 L2 Cache section in the A100 whitepaper for some more info on the L2 cache for this GPU:

The A100 L2 cache is a shared resource for the GPCs and SMs and lies outside of the GPCs.
The L2 cache is divided into two partitions to enable higher bandwidth and lower latency
memory access. Each L2 partition localizes and caches data for memory accesses from SMs in
the GPCs directly connected to the partition.

If there are transfers between the two partitions needed, it means that data was accessed from an SM that wasn’t local to the cache partition this data resided on.

Topic		Replies	Views
Specifying L2 cache partition for SM CUDA Programming and Performance	2	163	December 19, 2025
Is it possible to partition l2 cache? CUDA Programming and Performance	2	99	March 13, 2025
Use of L2 cache CUDA Programming and Performance	13	1146	March 26, 2025
Why A100 has the connection between two L2 partitions? Do they have different latency from L1 to L2? CUDA Programming and Performance	0	513	June 29, 2021
Hopper L2 partition data copy error? Nsight Compute	5	2450	June 7, 2024
How to utilize L2 partition? CUDA Programming and Performance	4	1047	January 18, 2023
Are Lovelace GPU L2 caches partitioned like the Ampere ones? CUDA Programming and Performance	4	265	September 28, 2024
Difference between L2 read/write transactions and L2_L1 read/write transactions ? CUDA Programming and Performance	3	1670	August 28, 2019
A100 L2 Partition Bandwidth CUDA Programming and Performance	3	490	June 4, 2024
L2 Bandwidth Value for A100 Calculation CUDA Programming and Performance	5	208	January 28, 2025

Meanings of L2 --> L2 copy

Related topics