The mapping is intentionally not given, not specified, and should not be relied upon, if discovered experimentally.
The only usage for individual access to fragment elements is when the operation to be done is uniform across all threads in the warp and all fragment elements. From the documentation:
Because the map of matrix elements into each thread’s
fragment
is unspecified
In the special case where all threads in the warp will apply an element-wise operation uniformly to all fragment elements, direct element access can be implemented using the following
fragment
class members.
For people who are looking for direct control of the matrix-multiply operands, I usually offer the suggestion to use PTX mma instructions, instead. Here is an example.