implications of the default setting(0) for Max Used Register (maxrregcount)

CudaaduC · March 27, 2013, 10:43pm

The CUDA C/C++ device properties has the Max Used Register set to 0.

What exactly does that mean in this context, and in general is this something I should be thinking about?

I have noticed small differences in running time when I changed that value, but nothing significant.

Visual studio 2010 x64 with a K20c.

Is there method to determining the optimal value for this property?

tera · March 28, 2013, 2:05am

-maxrregcount is mostly an obsolete compatibility setting. Nowadays you would use launch_bounds() directives on individual kernels right in the source code.

Limiting the number of registers used can be useful to increase occupancy, which may or may not make your kernel run faster. You can use the Occupancy Calculator spreadsheet to see if reducing the register count could improve occupancy. If you can improve occupancy by a small reduction in register count and that does not lead to a significant number of register spills (you get that information from nvcc by compiling with the –ptxas-options=-v flag), it’s just a matter of trying out and checking whether that improves performance.

njuffa · March 28, 2013, 6:49am

In addition to what tera said: My usual recommendation is to rely on the compiler defaults, unless there is a very good reason not to. In my experience, in most situations, the heuristics used by the compiler produce close to optimal bounds on register usage (this did not always use to be the case in the early days of CUDA). Tweaking register bounds to optimize performance is what I would classify as a “heroic” optimization.

That said, the constraints of a particular project may require heroic optimizations, so if you decide to tweak performance by manual manipulation of the register limits either via the launch_bounds() attribute or the -maxrregcount compiler flag, please be aware that the compiler evolves continuously and the generated code for non-trivial kernels tends to change from CUDA version to CUDA version. You may therefore have to occasionally check (and re-tweak) your settings if you want to maintain optimal performance.

CudaaduC · March 28, 2013, 7:34pm

Thanks both of you for the information. AT this point it seems best to focus on other optimizations.

In general I have been very impressed with the performance of the K20 with ints, float and doubles.

njuffa · March 28, 2013, 7:53pm

I don’t know what your application looks like, but as a general observation, Kepler provides a massive increase in FLOPS compared to Fermi, while memory the bandwidth grew more modestly. As a consequence it becomes increasingly important to use memory efficiently at all levels of the memory hierarchy. For applications that are compute-bound, I usually focus on simply minimizing dynamic instruction count (plus minimizing synchronization). For floating-point computations in particular many transformations are not value preserving and thus cannot be applied automatically by the compiler.

Topic		Replies	Views
Register usage of a device function for vector rotation CUDA Programming and Performance	14	693	June 12, 2022
Register demand CUDA Programming and Performance	2	2717	September 9, 2009
Register Usage of Kernels CUDA Programming and Performance	5	4499	September 23, 2013
109 registers per thread,should I be worried? CUDA Programming and Performance	1	596	July 20, 2014
Ways to reduce number of registers CUDA Programming and Performance	14	19912	January 16, 2008
Effect of launch bounds on register usage and spillage CUDA Programming and Performance	12	134	September 6, 2024
Register allocator overload CUDA Programming and Performance	2	3209	February 10, 2009
Anyone help me with the link to the register usage break points? CUDA Programming and Performance	4	328	July 14, 2022
reducing the number of used registers CUDA Programming and Performance	8	6305	September 22, 2009
Use of register An odd problem CUDA Programming and Performance	12	2290	August 12, 2010

implications of the default setting(0) for Max Used Register (maxrregcount)

Related topics