there are already popcnt intrinsics available in CUDA.
NVIDIA doesn’t maintain llvm or numba.
If you’re asking for a compiler idiom to automatically convert a code sequence in CUDA C++ to a popcnt, it’s probably best to file that as a RFE/bug as a registered developer at developer.nvidia.com
If you’re asking for something specific to llvm or numba, probably best to file as an issue at the appropriate place for those.
__device__ int __popc ( unsigned int x )
Count the number of bits that are set to 1 in a 32 bit integer.
__device__ int __popcll ( unsigned long long int x )
Count the number of bits that are set to 1 in a 64 bit integer.
Access via PTX is described in the PTX manual, here:
9.7.1.14. Integer Arithmetic Instructions: popc
popc
Population count.
Syntax
popc.type d, a;
.type = { .b32, .b64 };
Description
Count the number of one bits in a and place the resulting population count in 32-bit destination register d. Operand a has the instruction type and destination d has type .u32.