always-false if branch affects program performance

In a device function, I have such a if branch

if(lower){
w = Conjugate(2.0exp(-zz)-w); //complex version
//w = -Conjugate(w); //simplified version
}

I’m sure ‘lower’ will never be true. But the ‘complex version’(0.066us/evaluation) is slower than the ‘simplified version’(0.060us/evaluation0. Do you have any idea why?

Unless the compiler can determine at compile time that ‘lower’ cannot be true, different code will be generated because the compiler can eliminate the if-statement only if it is (provably) dead code. Is ‘lower’ a defined constant, or a template parameter?

You can check on differences in the generated code by extracting machine code (SASS) from the binary with cuobjdump --dump-sass.

No. lower cannot be determined at compile time.

device function(complex z){

bool lower = (imag(z)<0);

}

You might want to look into using two functions (for example, generated from a template), where one variant assumes ‘lower’ is false, and then call that function in all those contexts where you know that ‘lower’ is false.

The most likely reason for the performance difference is that the if-statement, which ultimately compiles to a branch instruction, inhibits some compiler optimization that only operate on straight-line code. The longer basic blocks become, the better compilers (independent of CUDA) tend to optimize code. Manual removal of the entire if-statement merges the two neighboring basic blocks into a longer basic block.

Thanks for your suggestion. I always need suggestion in removing branches.

Also thanks for your explanation. Than makes sense to me.

Another question:

Since the if statement is always false, I removed it. But that makes the program slower.

___________________________________ method1 method2 method3 method4
if(false) doing complex calculation 0.0636 0.0663 0.0375 0.0375
if(false) doing simple calculation 0.0570 0.0600 0.0339 0.0339
remove the if(false) statement 0.0570 0.0607 0.0343 0.0343

How to explain that?