Simple bitwise problem

Hey guys

I have a kernel running some bitwise calculations… Essentially I am looking for a very fast way to shift a single bit to the LSB of the word. Essential, I have a number:

in binary: 00x00000 - where x is 0 or 1
the mask: 00100000 meaning where the bit is

basically i want the result of the bit shift to equal 0000000x (ie move x to the LSB).

I want to somehow do:

NUM >>= (log(mask)/log(2));

but obviously much faster… any thoughts?

Apologies if it isn’t directly related to CUDA, but its very important to my CUDA kernel!!


if( NUM & mask )

  NUM |= 1;

But the bits between the mask-bit and LSB aren’t touched.

Or this

while( !(NUM & 1) ) NUM >>= 1;

if you intend to eliminate the trailing zeros. Otherwise try

while( !(mask & 1) )


  NUM >>= 1;

  mask >>= 1;




I just wasn’t thinking (or I was thinking too much about it)! Thanks, thats exactly what I needed.

To move the bit selected by mask to the LSB you can do:

NUM = ((NUM & mask) != 0);

The intrinsic __clz(x) can be used to calculate log_2, but I don’t know how fast it is.