Why not mention that paper right off the bat? The use of register file and register bank in the paper is entirely consistent with my own usage.
Access how? Any CUDA program can access the registers, the programmer just doesn’t get to chose which particular register is being accessed in a particular operation. If you mean access by a thread-specific index, you can write your own SASS assembler, after reverse engineering the machine-language encoding (or use Maxwell GPUs, in which case Scott Gray has already done most of the work for you).
If you mean access registers across the entire register file by some global index, I know of no way of doing that on actual hardware; you would have to create a simulator for that. Which is what the authors of the cited paper seem to have done (see section 6.1).
For what it is worth (not much, presumably :-) I am skeptical about the scheme in the paper. My experience is that energy-efficient computing is greatly aided by simplicity of design, not complexity of design. This is borne out by current GPUs in comparison with CPUs, although there is some degree of convergence. I am also aware that it is difficult to make accurate predictions of power consumption through modelling, although it is possible that things have improved since I last was involved in processor design.