What is the granularity for L2 access on Ampere and after?

In A100 L2 bandwidth slides, it indicate L2 is 64B/clk/slice and V100 is 32B/clk/slice. I am wondering how A100 to get 64B/clk/slice?

  1. How many tag check perform in each L2 slice in A100?
  2. Does it always fetch 2 sectors(64B) data from L2 to L1? or
  3. L2 support two independent 32B data fetch request from L1?