The difference between global and device functions

Can the effects of warp divergence be reduced by a global function calling a device function?

For example, a global kernel contains several if/switch statements. Warp divergence can seriously slow down the execution of this kernel over the warp. If the if/switch statements were contained in a device function called by the kernel can the effects of warp divergence be reduced if not eradicated?

No… function calls, like conditional branches, are taken by all threads in a warp as an indivisible unit. Some of the threads may be suspended during the call (if they’re in a branch) but threads from different warps aren’t somehow “combined” to make better packed warps.

This is true with or without function calls.

In practice, most device functions get inline expanded anyway.

If divergence is really a problem (it can be!) then the hardware isn’t going to solve it for you. You need to update your algorithms. The common way to do this is to do your own work consolidation, often by a compaction step. But this compaction itself has overhead so it’s only useful if your divergence is persistent over a large amount of computation.

This is the answer I was not hoping to get, but was expecting.

Thanks anyway.