Perforating the NVIDIA Cutlass K loop for approximate matrix multiply

I was trying to implement approximate matrix multiplication by perforating the K loop in Cutlass. The idea is that by changing the stride of the K loop in matrix multiplication, you can calculate an approximate product using only half of the elements of the matrix, possibly saving time at the cost of accuracy.

However, it is difficult to locate the correct file to perform this perforation. I believe the correct location is in the innermost ( c ) loop. The problem is that the indices in this file resemble convolution rather than matrix multiplication. Is this the correct loop to change the stride?