Branching in an optiX program.

How expensive is branching in an OptiX program? Specifically I am curious about places where all cuda processes should be taking the same part of the branch. For instance

{//begin block
statement1; //takes 1 sec
statement2; //takes 1 sec
statement3; //takes 1 sec
statement4; //takes 1 sec
}//end block

“a” is a variable I am setting from the CPU, so for any given launch it will be either 1, or 2. Will the shader still take 4 seconds to complete or is it smart enough to skip over one branch if none of the processes are taking it? I am asking because the code will remain simpler if I can keep this in but could potentially be split into separate shaders if not.


I’m posting the response from the mailing list:

This is a coherent branch. As long as all 32-threads in a warp take the same branch it will be efficient. In your case, it is even more than 32.

So it will take 2 seconds, plus a few additional instructions for checking the branch condition.