Question about api load_matrix_sync

When I’m going to do some sparse convolution, I may need to manipulate some points that are not adjacent in the smem, what is the best way to load them into the matrixs that will be calculated by wmma operations? For example, if I have 128 points’ features in my smem, each point has 64 in_channels. I may need to load points 1, 3, 5, 7, 9…for a calculation. Should I just copy them to a continuous memory then use load_matrix_sync to load them, or just use realize this api by myself at mma level?