Hi everyone. I want to take 4 bytes and extract the MSBs (31, 23, 15, 7) and pack the as 4 consecutive bits. I want to do this so I can take a 4x4 block of uint8 samples and generate a 16bit lookup table index, which reduces costly computation and takes advantage of the GPU’s || memories.
It seems the GPU doesn’t have good bit manipulation instructions like _mm_movemask_epi8(). The best way I can think of is this:
packed = ((x >> 7) & 0x3) | ((x >> 21) & 0xc)
but I would like something better since I have 4, 4byte ints to process.
which assumes all bytes are either 0xff or 0.
I don’t realy care about the order of the packed bits, but it would be preferable to make the 12 border bits contiguous.