What is the work-efficient time complexities for integer intrinsics functions in CUDA?

The functions can be found here: https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html. I want to know about the __clz(int x) function, which returns how many contiguous zero bits are present from the 32nd bit.