Integer logarithm to the base 2 - how?

Manticore · May 14, 2008, 7:11pm

In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.

The following code works perfectly, but too slow:

for(int i = 31; i >= 0; i--)

{

	if(x >> i)

 Â return i;

}

return 0;

The following code works almost perfectly, but sometimes it just returns wrong value and program crashes:

return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));

Is there any fast way to calculate integer logarithm in CUDA?

Thanks in advance

e.ping · May 14, 2008, 7:22pm

In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.

The following code works perfectly, but too slow:
for(int i = 31; i >= 0; i--)

{

	if(x >> i)

  return i;

}

return 0;
The following code works almost perfectly, but sometimes it just returns wrong value and program crashes:
return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));
Is there any fast way to calculate integer logarithm in CUDA?

Thanks in advance

[snapback]377102[/snapback]

I think what you want is the function __clz(x)

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits

starting at the most significant bit (i.e. bit 31) of integer parameter x.

Manticore · May 14, 2008, 8:56pm

Great thanks! I must have missed it :">

Anjul_Patney · May 14, 2008, 9:17pm

From /usr/local/cuda/include/device_functions.h:

(can someone explain how this works?)

__device_func__(int __clz(int a))

{

  return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;

}

seibert · May 14, 2008, 9:51pm

From /usr/local/cuda/include/device_functions.h:

(can someone explain how this works?)
__device_func__(int __clz(int a))

{

 Â return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;

}
[snapback]377167[/snapback]

Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn’t directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That’s going to be 31-log2(a).

Manticore · May 17, 2008, 10:51pm

Unfortunately, this function is not 100% precise too (

For example, I have a device function, that multiplies two binary polynomials a and b, that are stored in unsigned ints:

for(int i = 31; i >=0; i--)

	if(a & (1 << i))

	{

  lores ^= (b << i);

  hires ^= (b >> (32 - i));

  a ^= (1 << i);  

	}

// modulo operations

All works perfectly. But when I want to speed this up and use __clz:

while(a)

{

	i = 31 - __clz(a);

	lores ^= (b << i);

	hires ^= (b >> (32 - i));

	a ^= (1 << i);  

}

All other code remains the same, and I get significant performance boost - but sometimes I get ULF (seems to be because of wrong __clz() return value and, therefore, infinite cycle). This occures on different data sets and on different places each run for one data set.