Bank conflicts and reuse flags in Pascal

I’ve read what Scott Gray found out about register banks in Maxwell for MaxAs, but I haven’t been able to replicate his findings for Pascal. It seems that whenever I add/remove reuse flags and try to align registers to be in the same bank, I can’t get a conflict. The instruction works the same and doesn’t take longer in the pipeline, so it looks like reuse flags and register alignment have no effect. Could the behavior of register fetching have changed significantly for Pascal? If so, does anyone know the new behavior? I still feel like it must do something because NVCC still will add reuse flags wherever it can, but I just can’t find a difference.


NVIDIA is notoriously secretive about the details of their microarchitectures, and from historical observation it is clear that they do tend to make substantial implementation changes between major architecture generations.

So the answer to your first question is “yes”. The answer to the second question is “I don’t know, and I am not aware of anybody who has successfully reverse-engineered Pascal”.

Unless you have a lot of experience in reverse-engineering, I consider it possible that your experiments are not yet sophisticated enough to reveal the salient differences (if any) between Maxwell and Pascal implementations. This is not to discourage you in your efforts. But from personal experience reverse engineering the details of x86 FPUs in the 1990s I know how much painstaking work and experience successful reverse engineering requires.

This is where I think I’m stuck on, assuming that reuse flags have an effect and bank conflicts exist. I have ran cubins that are identical except for the existence of reuse flags (confirmed with cuobjdump) and can’t seem to find any difference when executed. I also did the same with shuffling around registers to cause what would have been a bank conflict with Maxwell, but still no difference. Is there anything else that it could effect?

At the moment I’m considering that Pascal reuse flags do nothing but were left in as an artifact from Maxwell…

That’s possible, of course. On the other hand, the trend in NVIDIA GPU architectures has been to reduce hardware complexity, and move that into software, essentially arriving at VLIW-like instruction bundles, then increasing the amount of op-steering information per bundle (first one control word per seven instructions, then one control word per three instructions). That makes it a tad harder to believe that Pascal would move some of that complexity back into the hardware.

The size of the register file and Pascal’s operating frequencies would suggest to me that the register file is still banked. But my processor building days are long behind me (AMD Athlon being the last one), so I don’t know what tricks circuit designers have up their sleeves these days.