sign() function

Hi everybody.

I’m looking for a sign() function on my device which would have the usual mathematical behaviour.

sign(x) = -1 if x < 0

sign(x) = 1 if x > 0

I’d like to implement this without any branches as I’m going to use it to avoid branches in some calculation. I’ve been thinking about two versions.

__device__ int sign(float x)

{ 

	return x > 0 ? 1 : (x<0 ? -1 : 0);

}

and

__device__ int sign(float x)

{ 

  const  int pos = signbit(x);

  const int neg = signbit(-x);

  return (pos - neg) / (pos + neg);

}

The problem with the first one is that I’m not sure it is branch-free (Is it ?) and the second one is not safe if x==0.

Do you know a better way of doing this ?

Well, I just found out that there is already a sign() function available on the device. It is, however, not present in the documentation…

Well, I just found out that there is already a sign() function available on the device. It is, however, not present in the documentation…

You can get the first one branch-free (at least with CUDA 3.1) by introducing a temporary variable:

__device__ int sign(float x)

{ 

	int t = x<0 ? -1 : 0;

	return x > 0 ? 1 : t;

}

You can get the first one branch-free (at least with CUDA 3.1) by introducing a temporary variable:

__device__ int sign(float x)

{ 

	int t = x<0 ? -1 : 0;

	return x > 0 ? 1 : t;

}

I think it’s a standard C function so they don’t document that.

@tera: is that more effective than the signf function ?

I think it’s a standard C function so they don’t document that.

@tera: is that more effective than the signf function ?

I guess not.

I guess not.

Ooops

Sorry. sign() does not exist. There was a macro defined elsewhere in the code which was creating errors.

@Jimmy Petterson: The C standard does not define any sign() function. Only signbit() exists.

Ooops

Sorry. sign() does not exist. There was a macro defined elsewhere in the code which was creating errors.

@Jimmy Petterson: The C standard does not define any sign() function. Only signbit() exists.

Sorry i was thinking about the “copysignf” function.

http://www.opengroup.org/onlinepubs/009695…s/copysign.html

Sorry i was thinking about the “copysignf” function.

http://www.opengroup.org/onlinepubs/009695…s/copysign.html

http://www.psc.edu/general/software/packages/ieee/ieee.php suggest you can do a bit of shifting (=bit-shifting) which ought to work as long as a variable can be cast into an (unsigned) int, without rounding it. I needed a device-function which determines whether two linearly dependent vectors point in the same direction:[codebox]device int same_colinear_direction(MVECTOR &a, MVECTOR &b)

{

if (fabs(a.x)>MINVECSIZE && (*(unsigned*)&a.x>>31) != (*(unsigned*)&b.x>>31)) return 0;			// compare sign bits

if (fabs(a.y)>MINVECSIZE && (*(unsigned*)&a.y>>31) != (*(unsigned*)&b.y>>31)) return 0;			// compare sign bits

if (fabs(a.z)>MINVECSIZE && (*(unsigned*)&a.z>>31) != (*(unsigned*)&b.z>>31)) return 0;			// compare sign bits

return 1;

}

[/codebox]

Ignoring the fabs() which I needed to discern values too close to zero, you can see the way the sign bit is isolated. Do you think this is sufficiently robust?

Jan

http://www.psc.edu/general/software/packages/ieee/ieee.php suggest you can do a bit of shifting (=bit-shifting) which ought to work as long as a variable can be cast into an (unsigned) int, without rounding it. I needed a device-function which determines whether two linearly dependent vectors point in the same direction:[codebox]device int same_colinear_direction(MVECTOR &a, MVECTOR &b)

{

if (fabs(a.x)>MINVECSIZE && (*(unsigned*)&a.x>>31) != (*(unsigned*)&b.x>>31)) return 0;			// compare sign bits

if (fabs(a.y)>MINVECSIZE && (*(unsigned*)&a.y>>31) != (*(unsigned*)&b.y>>31)) return 0;			// compare sign bits

if (fabs(a.z)>MINVECSIZE && (*(unsigned*)&a.z>>31) != (*(unsigned*)&b.z>>31)) return 0;			// compare sign bits

return 1;

}

[/codebox]

Ignoring the fabs() which I needed to discern values too close to zero, you can see the way the sign bit is isolated. Do you think this is sufficiently robust?

Jan

Yes as long as you are dealing with single-precision numbers, but it would be much more readable using signbit() instead…

And more portable performance-wise (signbit is a built-in intrinsic in CUDA).

By the way, what is wrong with branches?

Yes as long as you are dealing with single-precision numbers, but it would be much more readable using signbit() instead…

And more portable performance-wise (signbit is a built-in intrinsic in CUDA).

By the way, what is wrong with branches?

@Sylvain:

I guess in this case each branch contains very few instructions which means there’s a predicate placed infront of each instruction and no actual warp divergence happens?

@Sylvain:

I guess in this case each branch contains very few instructions which means there’s a predicate placed infront of each instruction and no actual warp divergence happens?

Thanks for pointing out signbit. Didn’t know and will use it, of course.

Branches (is this from another topic?):

Like them on trees, in data structures and cpu code, but dislike in cuda, because I think you risk an execution time which is the sum of all branches… (with some suitable qualifications). Divergence. If all threads in a warp take the same branch, it is no problem but no use either, I guess.

Please let me hear your thoughts on this?