I am currently developing my first CUDA application.
I have arrived at a problem at last: My kernel code performs a lot of hit checks. This is done by extracting the sign of multiple similar calculations and then performing one last, unavoidable if-check in every thread. Now the standard C way to perform sign extractions would be some kind of logical statement, implicitly causing if-statements and a LOT of branching between the threads. This is of course bad.
I did not find some kind of “sign” function in CUDA like I had initially hoped.
Currently I have 2 ideas of doing this:
Using atomicMax() and atomicMin() with respect to zero - does anybody know if this avoids branching ?
Writing some evil hack reading out the sign bit of the floats.
But first I wanted to post this for discussion here, as many of you seem to have great experience with CUDA.
Thansk for your time !!