The usual workaround for nvcc bugs like this is to put all your conflicting code in a different .cpp file and only leave stuff than needs nvcc (kernels, device functions, host functions that launch kernels with <<< >>>) in the .cu.
what I learned from a year of experience with CUDA: never use more complex language constructs then absolutely necessary. The closer to hardware the better. (also: never use uint64_t if two uint32_t’s will do). It will hopefully change when technology stabilizes.