A kernel I’ve been given to optimise has lots of if-then-else statements which evaluate a common parameter, i.e. the value of that parameter is always the same for all threads but can take upto N different values so all threads take one of N different paths. As a consequence all the threads will execute in parallel. ie no warp divergence, because the value of that parameter will be the same for all threads, but the calculation the threads take depends on the value of that parameter. But can the time taken to evaluate that parameter be significant? Particularly if there are lots of if-then-else statements to evaluate the parameter?
I assume that all threads evaluate this parameter before a decision is taken at runtime which path the warp will take. How long does this evaluation take?
These evaluations at runtime need not take place and the particular algorithm to adopt depending on that parameter can be made at compile time through #if-#else-#endif statements to insert only the correct code.