Does __popc() or __popcll() count '1' from 128-bit size data?

I have read CUDA 5.5 documents, but i am still not clear about __popc() can apply on 128-bit data (int4).??

i know that __popc() count ‘1’ from 32-bit data and __popcll() count from 64-bit data, but what about 128-bit data?? ,

any suggestion are appreciated

The prototypes for these functions clearly indicate that __popc() is for 32-bit data, while __popcll() is for 64-bit data. To count the bits in an int4, simply apply __popc() to the four individual components, and add the results:

int4 foo; 
int bits = __popc(foo.x) + __popc(foo.y) + __popc(foo.z) + __popc(foo.w)

