Understanding the mapa Instruction and Its Use in Cluster Address Mapping

I’m curious about the mapa instruction and how it facilitates address mapping within clusters. Specifically, if I have a cluster size of 2 (e.g., cluster=(2,1,1)) and a grid size of (2,2,1), how should I determine the appropriate block_id for mapa?

There is no obvious method that I can see to convert block ID to rank. A principal quantity for organization of the cluster is the block rank, you can find this represented both at the CUDA C++ level and at the PTX level.

A block can easily query its own rank. Furthermore, a block can find an address/location in another rank. I believe the cooperative usage is intended to follow that idea, and you can find examples in the CUDA C++ programming guide as well as to a lesser extent in the PTX manual.

In the example you have given, your grid would consist of two clusters. Each block can find out which cluster it belongs to, as well as how many clusters in the grid there are. Likewise with rank in cluster. I believe this should be sufficient to create collective algorithms.

If you want to see how this works at the CUDA C++ level, an example is here.

The mapa instruction does not take a block ID, but does take a “target rank”, and instead converts a shared variable address to a corresponding address in the target rank (the block that has that rank, in the cluster). If you compile the CUDA C++ example I linked above in godbolt you can see evidence of mapa PTX usage.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.