Maximum number of registers per thread reduced for SM 2.0

It seems like the --help text for nvcc didn’t get updated to reflect the fact that there is a reduced limit for the maximum number of registers per thread on SM 2.0. The help text still refers to a GPU specific maximum of 128 registers but it is 63 for SM 2.0. I couldn’t find this properly documented anywhere: programming guide, nvcc docs, etc.

Where did you learn that this limit had changed for SM 2.0?

You can observe this by compiling a kernel that uses > 128 registers and using the ‘-maxrregcount 128’ flag. For ‘-arch sm_10’ the number of registers will be 128, but for ‘-arch sm_20’ it will be 63.