It seems I mistook cuSingleComplex for the corresponding data type from the CUDA header file cuComplex.h. If you use your own struct, make sure you use the appropriate align directive to get the benefit of wide loads.
Small, local if-statements are not something CUDA programmers should worry about, and I would not advise to use manual replacement by clever arithmetic expressions to avoid a potential branch. This makes code harder to understand and falls under the general topic of “premature optimization”. Unless there is an important reason to do otherwise, I would recommend writing CUDA code in a clear, natural style.
The GPU hardware offers predicated execution for almost all instructions, provides “select”-type instructions (the directl equivalent to the ternary operator in C/C++), and other optimizations such as a uniform branch.
The compiler will most likely turn a simple assignment via ternary operator into a select-type instruction. It usually compiles very small if-statements into an inline sequence of predicated instructions, and larger if-statements into a combination of predicated instructions and uniform branch.