Some question about Swizzle atom layout

Hello everyone, I might need some help from you all, I’m having some problems learning the gpu for the hopper architecture.
The diagram is doing a 64b swizzle, but a square is 128b in size.This is where it comes from.1. Introduction — PTX ISA 8.7 documentation
I can see it compressing two rows into one and then doing a swizzle operation.But like I gave in the link, why does it have an 8x4 shape?Maybe 4x8 should be more straightforward.
Maybe I don’t understand the concept of Swizzle atom layout and don’t know what it does. Hope someone can help me.Thank you.

In other words.
How is the size of the Swizzle atom layout determined?