The “standard” C source-level idioms to compute min() and max() do not usually result in divergent code for most scalar types, since the compiler translates them into predicated execution or select instructions. You can use cuobdump to disassemble the machine code if you need to know for sure.
The min() / max() functionality is directly supported by hardware instructions for many scalar types (e.g. int, float, double) on both sm_1x and sm_2x devices. The various min/max instructions will be readily noticeable in disassembled machine code by their names.
The “standard” C source-level idioms to compute min() and max() do not usually result in divergent code for most scalar types, since the compiler translates them into predicated execution or select instructions. You can use cuobdump to disassemble the machine code if you need to know for sure.
The min() / max() functionality is directly supported by hardware instructions for many scalar types (e.g. int, float, double) on both sm_1x and sm_2x devices. The various min/max instructions will be readily noticeable in disassembled machine code by their names.
You are welcome. I notice belatedly that I should have been clearer in the second paragraph. By “min() / max() functionality” I was refering to the overloaded min() and max() functions that CUDA makes available in device code, as opposed to discrete source level constructs such as macros that accomplish the same thing. The functions are what gets translated into min/max instructions for common scalar types; the discrete constructs will typically map to predicated execution or select instructions for common scalar types.
The overall message is that programmers should not worry about divergence due to min/max computations. In general the direct use of built-in min() and max() functions will result in somewhat higher performance than the use of discrete equivalents, as the dynamic instruction count will be minimized.
You are welcome. I notice belatedly that I should have been clearer in the second paragraph. By “min() / max() functionality” I was refering to the overloaded min() and max() functions that CUDA makes available in device code, as opposed to discrete source level constructs such as macros that accomplish the same thing. The functions are what gets translated into min/max instructions for common scalar types; the discrete constructs will typically map to predicated execution or select instructions for common scalar types.
The overall message is that programmers should not worry about divergence due to min/max computations. In general the direct use of built-in min() and max() functions will result in somewhat higher performance than the use of discrete equivalents, as the dynamic instruction count will be minimized.