AND: Logical && vs bitwise &

Hi,

I’m looking for a good way to detect local minima/maxima.

Let’s say that each thread has five float values in shared memory. I think I should be able to use bitwise operators to check for min/max, and that it could be fast.

bool min = false;

float threadArr[5];

fillArray(threadArr, globalMem);

float center = threadArr[2];

min = (center < threadArr[0]) & (center < threadArr[1]) & (center < threadArr[3]) & (center < threadArr[4]);

My idea is that code like this would be better than logical && operators, because the bit operators don’t cause branching:

min = (center < threadArr[0]) && (center < threadArr[1]) && (center < threadArr[3]) && (center < threadArr[4]);
  1. Am I crazy?

  2. Do I need parentheses around the bitwise &'s?

  3. Are the logical operators && and || short-circuiting in CUDA?

  1. No, you are not crazy. I have one kernel where I get a tiny performance improvement by using bitwise & instead of &&.
  2. The parentheses can’t hurt :) And they certainly make the code more readable. Check a C reference book on the priority of the & and < operators to know for sure.
  3. Yes, && do short circuit.

Lastly, I will add that in CUDA you often have to try both. I’ve had kernels where I went out of my way to avoid divergent warps that ended up being much slower than the simpler kernel with divergent warps.

Is the data to be analyzed distributed in a 2D way? If you need to detect local minimum/maxima inside a 2D window, maybe you could use as basis the code that comes with CUDA sdk, the “convolutionSeparable” example.