In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.

The following code works perfectly, but too slow:

```
for(int i = 31; i >= 0; i--)
{
if(x >> i)
Â return i;
}
return 0;
```

The following code works almost perfectly, but sometimes it just returns wrong value and program crashes:

```
return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));
```

Is there any fast way to calculate integer logarithm in CUDA?

Thanks in advance

In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.

The following code works perfectly, but too slow:

```
for(int i = 31; i >= 0; i--)
{
if(x >> i)
return i;
}
return 0;
```

The following code works almost perfectly, but sometimes it just returns wrong value and program crashes:

```
return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));
```

Is there any fast way to calculate integer logarithm in CUDA?

Thanks in advance

[snapback]377102[/snapback]

I think what you want is the function __clz(x)

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits

starting at the most significant bit (i.e. bit 31) of integer parameter x.

I think what you want is the function __clz(x)

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits

starting at the most significant bit (i.e. bit 31) of integer parameter x.

[snapback]377113[/snapback]

Great thanks! I must have missed it :">

From /usr/local/cuda/include/device_functions.h:

(can someone explain how this works?)

```
__device_func__(int __clz(int a))
{
return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;
}
```

From /usr/local/cuda/include/device_functions.h:

(can someone explain how this works?)

```
__device_func__(int __clz(int a))
{
Â return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;
}
```

[snapback]377167[/snapback]

Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn’t directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That’s going to be 31-log2(a).

Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn’t directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That’s going to be 31-log2(a).

[snapback]377187[/snapback]

Unfortunately, this function is not 100% precise too (

For example, I have a device function, that multiplies two binary polynomials a and b, that are stored in unsigned ints:

```
for(int i = 31; i >=0; i--)
if(a & (1 << i))
{
lores ^= (b << i);
hires ^= (b >> (32 - i));
a ^= (1 << i);
}
// modulo operations
```

All works perfectly. But when I want to speed this up and use __clz:

```
while(a)
{
i = 31 - __clz(a);
lores ^= (b << i);
hires ^= (b >> (32 - i));
a ^= (1 << i);
}
```

All other code remains the same, and I get significant performance boost - but sometimes I get ULF (seems to be because of wrong __clz() return value and, therefore, infinite cycle). This occures on different data sets and on different places each run for one data set.