Early return vs if statement

What performs better in CUDA? Does this affect branch predication?

This question has no generally applicable answer.

There is no branch prediction mechanism in GPUs. Straight-line flow is assumed, any branch directing control flow elsewhere incurs a delay equal to the depth of the pipeline. However, this basically does not matter in a throughput architecture with zero overhead thread switching.

Since the CUDA compiler performs extensive code transformations, several of which can eliminate branches, it is generally hard to predict the structure of the generated machine code SASS from the structure of the high-level C++ code. My recommendation would be to write the code in a manner which you consider natural and most easily understood by programmers. The most important function of HLL code is to convey information to other humans or future you.

One pattern that can still be useful when there is a frequently taken common path through code and an infrequently taken special-case path, is to compute the common path result first, then selective override it with the special case result at the end, using an if-statement.

1 Like

Wow! Thank you! I’m glad I don’t have to do some awkward code transformations :) And your advice made my code significantly faster!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.