Can the data in different sectors(32B) in a L1 cache line (128B) of the Ampere architecture come from non-consecutive memory addresses?

In Ampere Architecture, the data in different sectors(32B) in a L1 cache line (128B) of the Ampere architecture can come from non-consecutive memory addresses? If so, is this also true for L2?

Does this comment mean that the four sectors of a L1 cache line can stored data from non-consecutive memory addresses? Then a warp can get 4 sectors data with different starting addresses from one cache line at a time?

The answer to the titular question is “no”. The tag represents the information that indicates that data from a particular system memory address is cached here. This information is shared between the sectors comprising the cache line.

The point of sectored caches is minimizing tag storage without causing excessive memory traffic due to the use of long cache lines. By sectoring a cache line, with per-sector status flags (e.g. valid, dirty), data can be replaced one sector at a time, reducing memory bandwidth requirements for updating the cache.

I am not sure what drove the adoption of 4-sector cache lines in GPUs. Coming from CPU design, it strikes me as a somewhat unusual design. But I think it is safe to assume that NVIDIA performed a cost-benefit analysis and this configuration offered the best performance within a given silicon budget.