register allocation bug in 1.0? nvcc crashes

Hello. I’ve got a code of the form:

switch(foo) {

case 1:





case 2:an





case 3:






The actual code is too complex to post right now. The complexity of do1(), do2(), and do3() are pretty similar. If I comment out any one of the three cases, the code compiles, and the .cubin file says my overall global function uses the following resources:

The resources are the same whether I leave one or two cases in the code. But if I leave all three cases in the code, nvcc crashes with the following output:

### Assertion failure at line 2268 of ../../be/cg/NVISA/cgtarget.cxx:

### Compiler Error in file /tmp/tmpxft_00000c88_00000000-4.i during Register Allocation phase:

### ran out of registers in integer64

*** glibc detected *** /usr/local/cuda/open64/lib//be: free(): invalid pointer: 0x0000000000f97560 ***

======= Backtrace: =========















======= Memory map: ========

00400000-00802000 r-xp 00000000 fd:04 8716431                            /usr/local/cuda/open64/lib/be

00a01000-00c54000 rw-p 00401000 fd:04 8716431                            /usr/local/cuda/open64/lib/be

00c54000-03167000 rw-p 00c54000 00:00 0                                  [heap]

37ce800000-37ce81a000 r-xp 00000000 fd:04 23167001                       /lib64/

37cea19000-37cea1a000 r--p 00019000 fd:04 23167001                       /lib64/

37cea1a000-37cea1b000 rw-p 0001a000 fd:04 23167001                       /lib64/

37cec00000-37ced46000 r-xp 00000000 fd:04 23167003                       /lib64/

37ced46000-37cef46000 ---p 00146000 fd:04 23167003                       /lib64/

37cef46000-37cef4a000 r--p 00146000 fd:04 23167003                       /lib64/

37cef4a000-37cef4b000 rw-p 0014a000 fd:04 23167003                       /lib64/

37cef4b000-37cef50000 rw-p 37cef4b000 00:00 0 

37cf000000-37cf082000 r-xp 00000000 fd:04 23167210                       /lib64/

37cf082000-37cf281000 ---p 00082000 fd:04 23167210                       /lib64/

37cf281000-37cf282000 r--p 00081000 fd:04 23167210                       /lib64/

37cf282000-37cf283000 rw-p 00082000 fd:04 23167210                       /lib64/

37d1000000-37d100d000 r-xp 00000000 fd:04 23167216                       /lib64/

37d100d000-37d120d000 ---p 0000d000 fd:04 23167216                       /lib64/

37d120d000-37d120e000 rw-p 0000d000 fd:04 23167216                       /lib64/

37d3800000-37d38e6000 r-xp 00000000 fd:04 8749090                        /usr/lib64/

37d38e6000-37d3ae5000 ---p 000e6000 fd:04 8749090                        /usr/lib64/

37d3ae5000-37d3aeb000 r--p 000e5000 fd:04 8749090                        /usr/lib64/

37d3aeb000-37d3aee000 rw-p 000eb000 fd:04 8749090                        /usr/lib64/

37d3aee000-37d3b00000 rw-p 37d3aee000 00:00 0 

2aaaaaaab000-2aaaaaaac000 rw-p 2aaaaaaab000 00:00 0 

2aaaaaad4000-2aaaaaad7000 rw-p 2aaaaaad4000 00:00 0 

2aaaaacd4000-2aaaaacd5000 rw-p 2aaaaacd4000 00:00 0 

2aaaaacd5000-2aaaaad5c000 rw-p 2aaaaac55000 00:00 0 

2aaaaad5c000-2aaaaad5e000 rw-p 2aaaaad5c000 00:00 0 

2aaaac000000-2aaaac021000 rw-p 2aaaac000000 00:00 0 

2aaaac021000-2aaab0000000 ---p 2aaaac021000 00:00 0 

7fff34108000-7fff341ee000 rw-p 7fff34108000 00:00 0                      [stack]

ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vdso]

Signal: Aborted in Register Allocation phase.

Error: Signal Aborted in phase Register Allocation -- processing aborted

*** Internal stack backtrace:

    /usr/local/cuda/open64/lib//be [0x684b0e]

    /usr/local/cuda/open64/lib//be [0x685ed7]

    /usr/local/cuda/open64/lib//be [0x68576c]

    /usr/local/cuda/open64/lib//be [0x6858d1]

    /usr/local/cuda/open64/lib//be [0x686581]

    /lib64/ [0x37cec300c0]

    /lib64/ [0x37cec30065]

    /lib64/ [0x37cec31b00]

    /lib64/ [0x37cec6825b]

    /lib64/ [0x37cec6f504]

    /lib64/ [0x37cec72b2c]

    /usr/local/cuda/open64/lib//be [0x570fa0]

    /usr/local/cuda/open64/lib//be [0x594aa8]

    /lib64/ [0x37cec32eb5]

    /usr/local/cuda/open64/lib//be [0x684f4a]

    /usr/local/cuda/open64/lib//be [0x514c4e]

    /usr/local/cuda/open64/lib//be [0x514dfd]

    /usr/local/cuda/open64/lib//be [0x524b9a]

    /usr/local/cuda/open64/lib//be [0x419b5d]

    /usr/local/cuda/open64/lib//be [0x419f4d]

    /usr/local/cuda/open64/lib//be [0x41b182]

    /lib64/ [0x37cec1d8a4]

    /usr/local/cuda/open64/lib//be [0x417609]

nvopencc INTERNAL ERROR: /usr/local/cuda/open64/lib//be died due to signal 4

I can probably work more on trying to turn the code into a small example. Any ideas? I haven’t tried it in 1.1 yet (1.1 had other link errors for me before, so I’m hesistant to reinstall again).

please ignore the typographical error in case 2.

For reasons linked to the underlying architecture, NVCC seems sometimes to fork multiway branches to execute the prepare the portions of code to be executed in parallel way. Only later, when the switched variable gets its value, the data path is choosen. This behaviour leads to an unfair register allocation (as your three branches are considered in the same context when compiled, so they share the register map) and maybe with all the choices active, the register file is not enough to handle everything.

I had the same trouble with a seismic ray-racing algorithm, and found no way to get out… the only chance you have is to avoid as much as possibile if {} and switch…case statements in kernels (if possible, of course).