Virtual Register Allocation failed / function scope optimization

We have a kernel of this type

global streamclassifier( inputs ) {
#include “header1.h”
#include “header2.h”
#include “headerN.h”

if (threadIdx.x < 32) {
“Call the device function in header1.h”
else if (threadIdx.x < 64) {
“Call the device function in header2.h”

else if(threadId.x < (N-1)*32)
“ call the device function in headerN.h”

Here N could be as high as 140. Each device function that the warp calls has a lot of bit manipulations ( and’ing and or’ing). Because we included all the header files in to the same kernel, we are getting errors wrt to the virtual register allocation, precisely ran out of virtual registers. We tried the –opencc-options -OPT:Olimit = 0, but the compiler ran out of memory and had to reboot the machine. We have Telsa cards. We can at max compile this for two header files (N = 2).
We are doing this so that when we launch this kernel, each warp of threads computes a different function. Is there any other better way to do what we desire ? Like a compiler switch that will reuse the virtual registers ?

Try reshuffle your code, for example, include your headers outside the function body, replace IFs with SWITCH and so on.

The headers files are outside the function, i accidentally put them in the function while i was describing the problem, Switch-Case statements don’t work either.

I am seeing the same problem using a large case statement. Have you resolved this problem?