Is there a Nvidia tool that can suggest how to change register use?

Register use seems quite important to maximise occupancy. Has Nvidia produced a tool to advise on how a particular kernel can be altered to change register use during execution?

Nothing that elegant. There is the CUDA occupancy calculator, which takes register usage as an input. There is an occupancy API to advise code at runtime. There are also various controls (e.g. -maxrregcount, launch_bounds) to limit register usage (which you might want to use in an attempt to increase occupancy, for example), and there is a “register liveness view” in the disassembly tools, which will give you some sense of where registers are used/required, for a particular compiler output, i.e. for a particular SASS code sequence.

Register spill warnings option in the PTX compiler will identify the problematic functions. Looking at SASS (raw assembly) output for these functions with cuobjdump utility can also be helpful. But on modern architectures this is becoming less and less of an issue due to faster and larger caches.

The binary utility nvdisasm with the -plr option is useful for visualizing the register footprint of small kernels.