I have a kernel running some bitwise calculations… Essentially I am looking for a very fast way to shift a single bit to the LSB of the word. Essential, I have a number:

in binary: 00x00000 - where x is 0 or 1
the mask: 00100000 meaning where the bit is

basically i want the result of the bit shift to equal 0000000x (ie move x to the LSB).

I want to somehow do:

NUM >>= (log(mask)/log(2));

but obviously much faster… any thoughts?

Apologies if it isn’t directly related to CUDA, but its very important to my CUDA kernel!!