".reuse" in SASS instructions

What does it mean to have a “.reuse” clause in a SASS assembly instruction?

For example,

FFMA.FTZ R23, R26.reuse, 4.5, R37.reuse;

From Scott Gray’s posts, it appears this has something to do with register banks.

For the same kernel, if we have one version with a higher number of “.reuse”, would that result in a lower L2 miss rate (we are seeing evidence of that)? If so, why?

Thanks.

NVIDIA does not publicly document the microarchitecture to that level of detail. Whatever Scott Gray reverse engineered is the most detailed information available that I am aware of. I can’t think of a reason why reuse in the register file would lower L2 miss rate. Have you tried getting into touch with Scott Gray to ask him?

Assuming you had a detailed description of .reuse, how would that benefit your use case? What are you trying to accomplish?

Thanks.

We haven’t been able to get in touch with Scott Gray directly yet.

We are working on a tool that tries to improve register allocation with runtime feedback. We are observing many cases where we are seeing performance improvement because of typical reasons (reduced spills, better data reuse, higher achieved occupancy). We are also seeing a few cases where it appears the performance improvement is coming from a higher number of .reuse instructions in SASS and fewer L2 misses.

Best of luck to you. If it were me, I wouldn’t spend time on compiler backends for GPUs given the dearth of information NVIDIA provides about the microarchitecture (which also changes with every GPU generation).

reuse flags are described here:

https://github.com/NervanaSystems/maxas/wiki/SGEMM#calculating-c-register-banks-and-reuse

If you’re compiling your code through ptxas, then you have very little control over them. You’d have to use maxas to set them manually, but even there it’s easier to let maxas set them them for you and just fine tune the ordering of your instructions to maximize the potential for register reuse.

You can think of the reuse buffers as a little cache that sits over the register bank. They have nothing to do with the other caches except in serving some common purpose.

@scottgray, thanks very much for the info. This is very helpful.