I have a kernel running some bitwise calculations… Essentially I am looking for a very fast way to shift a single bit to the LSB of the word. Essential, I have a number:
in binary: 00x00000 - where x is 0 or 1
the mask: 00100000 meaning where the bit is
basically i want the result of the bit shift to equal 0000000x (ie move x to the LSB).
I want to somehow do:
NUM >>= (log(mask)/log(2));
but obviously much faster… any thoughts?
Apologies if it isn’t directly related to CUDA, but its very important to my CUDA kernel!!