Hi,
Is there high-performance implementation of bitwise rotation implementation in CUDA? Thanks
Hi,
Is there high-performance implementation of bitwise rotation implementation in CUDA? Thanks
The hardware doesn’t have a rotate instruction, so you have to implement it using shifts and logic ops. I believe the code for this is pretty standard.