problems reported with __ffs()?

I’m working on a CUDA kernel that could really use the __ffs() intrinsic, but I don’t get correct operation. I’d like to do

shift = __ffs(v[0]) - 1;

			if (shift > 31)

				shift = 31;

but instead have to replace with

j = v[0];

			shift = 0;

			if (j == 0) {

				shift = 31;

			}

			else {

				while ((j & 1) == 0) {

					shift++;

					j >>= 1;

				}

			}

Are these two snippets supposed to have the same semantics? shift is unsigned, so the intent is to have a shift amount that tops out at 31. The first snippet does work on a CPU when using linux’s ffs() library function.

Any help appreciated,

jasonp

I’m working on a CUDA kernel that could really use the __ffs() intrinsic, but I don’t get correct operation. I’d like to do

shift = __ffs(v[0]) - 1;

			if (shift > 31)

				shift = 31;

but instead have to replace with

j = v[0];

			shift = 0;

			if (j == 0) {

				shift = 31;

			}

			else {

				while ((j & 1) == 0) {

					shift++;

					j >>= 1;

				}

			}

Are these two snippets supposed to have the same semantics? shift is unsigned, so the intent is to have a shift amount that tops out at 31. The first snippet does work on a CPU when using linux’s ffs() library function.

Any help appreciated,

jasonp

Is “shift” of type “int” by any chance? If so, if v[0] is 0, then ffs(v[0]) and thus shift is -1 with the first snippet, but 31 with the second snippet.

Is “shift” of type “int” by any chance? If so, if v[0] is 0, then ffs(v[0]) and thus shift is -1 with the first snippet, but 31 with the second snippet.

‘shift’ is an unsigned int so that -1 should become 2^32 - 1. Anyway, v[0] is almost never expected to be exactly zero in this application.

‘shift’ is an unsigned int so that -1 should become 2^32 - 1. Anyway, v[0] is almost never expected to be exactly zero in this application.

If “shift” is an “unsigned int” I see no reason the code should not work. There are no known issues with __ffs(). If you could post a short, self-contained example application that reproduces the issue you observed, and I would be happy to follow up. Please mention the expected output as well as what you actually observe on the GPU. Thanks!

If “shift” is an “unsigned int” I see no reason the code should not work. There are no known issues with __ffs(). If you could post a short, self-contained example application that reproduces the issue you observed, and I would be happy to follow up. Please mention the expected output as well as what you actually observe on the GPU. Thanks!

Does

shift = __ffs(v[0]) - 1;

			if (v[0] == 0)

				shift = 31;

work?

Does

shift = __ffs(v[0]) - 1;

			if (v[0] == 0)

				shift = 31;

work?