I have a large kernel that uses too many registers (46) that kill occupancy. I’ve been hand-optimizing for days but don’t seem to be able to beat the compiler :-) It is possible to split the kernel into two disjunct ones with 24 and 47 registers, but that introduces global memory reads and writes and the overhead of calling two kernels that i don’t really like either.
I was wondering if any clever general principles for reducing register usage exist. Or that it always depends on a particular kernel. The programming and best practices guide don’t say too much on the subject.