after i compile my code , i find that each kernel uses 77 registers . and it cost me much time to compile my code. i want to know how to reduce the registers used in each kernel .
thank you !
a) split it up into several smaller kernels
b) use the --maxregcount option (or specify maxregisters on the Make command line when using the SDK makefile), but expect low performance due to local memory use
c) analyze the PTX to see which data takes most registers and where you can put stuff into shared memory instead.
d) apply the volatile trick at carefully selected places (search the forums for related postings)
thank you very much! it is very useful to me.