Register Bank trace


I would like to develop a scheme to decrease energy comsumption at the register granularity.

For that I would like to access the data that are stored inside the register banks, however I don’t know how to do that.
Can someone help me?


By the way I’m using a NVIDIA GEFORCE GTX 950M on Windows.

what is register bank? it lacks definition in official CUDA terminology

That’s hardly surprising since register banks are a hardware implementation detail that is abstracted away above the level of SASS. NVIDIA does not make such details of the machine architecture publicly available, but Scott Gray did quite a bit of reverse engineering on this for Maxwell:

One should expect the details of register banking to differ between the major GPU architectures. In general register banks are employed to keep access times for large register files reasonable.

Register bank are used to store register and send them to the function unit when needed. They are the closest to the function units.

It seems to me that you are referring to the register file, which comprises a number of registers. A register file does not have to utilize banks. Small register files often use a monolithic multi-ported design, where the number of ports is sufficient to feed all source operands to a functional unit in the same cycle. For large register files (as are used in GPUs) banked register files can offer the advantages of reduced access times and reduced energy usage.

I was speaking of register bank as in this article “Warped-Compression: Enabling Power Efficient GPUs
through Register Compression”. It might be the same as your banked register files.

Do you know how I could access these files and see their content?

Why not mention that paper right off the bat? The use of register file and register bank in the paper is entirely consistent with my own usage.

Access how? Any CUDA program can access the registers, the programmer just doesn’t get to chose which particular register is being accessed in a particular operation. If you mean access by a thread-specific index, you can write your own SASS assembler, after reverse engineering the machine-language encoding (or use Maxwell GPUs, in which case Scott Gray has already done most of the work for you).

If you mean access registers across the entire register file by some global index, I know of no way of doing that on actual hardware; you would have to create a simulator for that. Which is what the authors of the cited paper seem to have done (see section 6.1).

For what it is worth (not much, presumably :-) I am skeptical about the scheme in the paper. My experience is that energy-efficient computing is greatly aided by simplicity of design, not complexity of design. This is borne out by current GPUs in comparison with CPUs, although there is some degree of convergence. I am also aware that it is difficult to make accurate predictions of power consumption through modelling, although it is possible that things have improved since I last was involved in processor design.