32 bit Float value question Zero insignificant bits after decimal pt

Romant · July 1, 2008, 2:57pm

Hi All,

As GPU float division is not IEEE-compliant and accumulates significant error I’d like to zero all fractional digits after the 6th one each time division occures (at least this should eleminate bad results that are hard to filter out).

For example: source value 0.123456111 should become 0.123456.
The fastest way is to manipulate bits directly I think.

Does anybody know how to implement this ?

Thanks in advance.

SPWorley · July 1, 2008, 4:11pm

The programming guide says that division is good to 2 ULP, so that shouldn’t cause any significant errors in bits of output. The not-IEEE part is mostly about handling divides by 0, infinities, NaNs, etc.

Can you give an example where the divide gives the wrong answer?

The specific answer to your question is just a bit mask, look at the IEEE FP format, you can cast the raw float bit representation as an int then mask out the last bits if you like. A C union, or just raw ugly type casts can do this.

float a=1234.34454343446f;

// mask out last 8 bits of FP value

*((unsigned int *)&a) &= 0xFFFFFF00;

But I don’t think you really want to do this.

Romant · July 1, 2008, 5:46pm

The programming guide says that division is good to 2 ULP, so that shouldn’t cause any significant errors in bits of output. The not-IEEE part is mostly about handling divides by 0, infinities, NaNs, etc.

Can you give an example where the divide gives the wrong answer?

The specific answer to your question is just a bit mask, look at the IEEE FP format, you can cast the raw float bit representation as an int then mask out the last bits if you like. A C union, or just raw ugly type casts can do this.
float a=1234.34454343446f;

// mask out last 8 bits of FP value

*((unsigned int *)&a) &= 0xFFFFFF00;
But I don’t think you really want to do this.

[snapback]403327[/snapback]

Division itself is not too big problem - the problem appears when divisions are nested (a / b * b / a * a / a is not always 1.0), and my task involves lots of such expressions. Sometimes the result may be not 1.0 but, say, 1.00000012345 and then, when I subtract 1.0 from 1.00000012345 I receive very small value but should receive zero.

As my main goal is to achieve same results for expression being evaluated on CPU and GPU, I need a way to make division work similarly on both CPU and GPU (when I don’t use division in my expressions, but only + - * operations, results are perfectly matched). Thus, I’m ready to truncate the result of division on both CPU and GPU in order to get the same division result.

You propose zeroing of 16 bits of fraction part. Why 16 ??

Romant · July 1, 2008, 6:29pm

I believe that zeroing won’t work at all … fraction consists of bits and each bit represents (1/2)^n. Error truncation is possible via total fraction rearrangement, the problem is the speed of rearrangement.

SPWorley · July 1, 2008, 9:07pm

So your problem isn’t a CUDA issue, it’s just the limited precision of IEEE single precision floating point? If your computation is so dependent on the very lowest bits of a floating point value, you’re going to run into problems on CPUs, GPUs, pretty much everywhere.

If CUDA’s 2ULP accuracy really is the problem, you could improve that by splitting your divide out to do a reciprocal first, which is 1ULP. And it’d be complete overkill, but a Newton iteration would probably make the error 0ULP.

// compute x/y

float a, b, c;

a=x/y; // 2ULP error

b=1.0f/y; // 1ULP error

b=b*x; 

c=1.0f/y;

c=c*(2.0f-y*c);  // Newton iteration for reciprocal

c=c*x;

But I stress again, if you’re relying on those last bits of a float, that’s your real problem, not the divide accuracy! This rule is perhaps the most important practical guideline of all numeric computing.

That code snippet zeros out the last 8 bits, since you were asking how to zero out low fractional bits. You could change it to 4 by using a mask constant of 0xFFFFFFF0 or 10 by using 0xFFFFFFC0, etc.

But again I don’t think this is what you want to do.

Romant · July 2, 2008, 9:27am

The main goal is not to achieve absolute accuracy - but to get same (or approximately same) results on CPU and GPU, 1/x and newton iterations are helpful, thank you!

Topic		Replies	Views
Floats and floats... difference between CPU and GPU? CUDA Programming and Performance	12	14073	February 2, 2010
Working on Floats as Integers Tips needed CUDA Programming and Performance	10	9086	January 15, 2008
Divide by zero handling CUDA Programming and Performance	5	9293	September 28, 2009
integer division and modulo CUDA Programming and Performance	14	22481	November 10, 2008
float/int question CUDA Programming and Performance	3	1509	February 22, 2009
Unit interval 16-bit (1.15) float hack CUDA Programming and Performance	3	1592	May 25, 2016
Is it possible to replace integer division by floating-point division for speed CUDA Programming and Performance cuda	9	2212	March 24, 2022
discrepancy between CPU and GPU after a division (accuracy issue) CUDA Programming and Performance	3	1494	June 10, 2015
error when trying to use half (fp16) CUDA Programming and Performance	16	19878	October 13, 2015
IEEE-754 compliant division CUDA Programming and Performance	5	10104	November 26, 2008

32 bit Float value question Zero insignificant bits after decimal pt

Related topics