if(lower){
w = Conjugate(2.0exp(-zz)-w); //complex version
//w = -Conjugate(w); //simplified version
}
I’m sure ‘lower’ will never be true. But the ‘complex version’(0.066us/evaluation) is slower than the ‘simplified version’(0.060us/evaluation0. Do you have any idea why?
Unless the compiler can determine at compile time that ‘lower’ cannot be true, different code will be generated because the compiler can eliminate the if-statement only if it is (provably) dead code. Is ‘lower’ a defined constant, or a template parameter?
You can check on differences in the generated code by extracting machine code (SASS) from the binary with cuobjdump --dump-sass.
You might want to look into using two functions (for example, generated from a template), where one variant assumes ‘lower’ is false, and then call that function in all those contexts where you know that ‘lower’ is false.
The most likely reason for the performance difference is that the if-statement, which ultimately compiles to a branch instruction, inhibits some compiler optimization that only operate on straight-line code. The longer basic blocks become, the better compilers (independent of CUDA) tend to optimize code. Manual removal of the entire if-statement merges the two neighboring basic blocks into a longer basic block.