In hopper, what is rs and ss strategy?

in case anyone is wondering, these are part of cutlass

1 Like

Register Storage vs. Shared Storage (for operand A) maybe?

1 Like

Hey, that sounds very interesting! Regardless of cutlass details, how to save operand A in register or shared? I guess… maybe in epiloge, the tensor core output is in register, either we directly save them to global, or we save to shared to make them coalesced and then to global?

Yes, you can custom-save elements with the epilogue. See

with links to

1 Like