I was looking up some information on bitwise tricks to do faster operations, and I saw that a quick way to do the absolute value operation on a 32-bit integer is:
//version 1 i = x < 0 ? -x : x;
And that’s the version that nVidia has used for the implementation in device code (see math_functions.h in the include directory of your CUDA toolkit). However, the same page I was reading also lists this version, which apparently is somewhat faster:
//version 2 i = (x ^ (x >> 31)) - (x >> 31);
If I get a little time after work today or tomorrow I’ll replace it and try to run some tests on it. The page I read claims that it’s 20% faster in their tests, but I wonder if it could be even better for CUDA since it there’s no branching. Maybe one of the nVidia compiler guys can take a look as well?
Also, here’s the page I found, if you’re interested in some other ones (some of which are fairly well-known though):