I am having serious problems with register usage explosion. I have read
all the messages on the subject, and it is clear that the optimization of register
usage is complex and takes many things into account (bank conflicts, etc.)
However, while you decrease bank conflicts, you might increase bandwidth
in other parts of the code.
I would like to propose a simple solution to give the users “some” control.
Perhaps there could be compiler directives inserted into the CUDA code, such as
The compiler would only optimize register usage between such pairs of statments.
In the absence of such statements, the compiler would optimize the entire
code. This should be very easy to do since the optimizer more than likely
optimizes code between some initial statement and some end statements. The point is
that optimization could not cross these “barrier” statements.
Could somebody please comment on this idea’s feasibility. This would help many people
working with complex simulations. Thanks.