Floating point round-off limit (epsilon)

FangQ · August 17, 2015, 5:32pm

I am working on a ray-tracing-in-grid-space code. for every step, I need to propagate my particle to the first intersection to the enclosing voxel, and ensure that the particle is out-side of the voxel (so, when taking floorf(x/y/z) it will be a different integer).

I am having difficulty for the second goal, i.e., make sure the coordinate is across integer boundaries.

here is my code snip

#define EPS 1e-6f

// float3 p is the particle position, float3 v is the velocity vector,
// id is the index of the first intersection plane id={0,1,2}: passing grid in {x,y,z}-dir

// at this point, p is already updated and should be located ON the boundary
// the next line ensures it is across the boundary

((float*)p)[id]=__float2int_rn(((float*)p)[id])+((((float*)v)[id] > 0.f)-0.5f)*2.f*EPS;

I found that, when I set the scaling factor before EPS to 2.f as above, the code may fail, but if I set it to 4.0, it works for most particles I tested.

I am wondering how to do this rigorously? is there an FLT_EPSILON for float in CUDA?

any other more efficient ways?

thanks

FangQ · August 17, 2015, 5:36pm

by the way, the p.{x,y,z} are used later by converting to integer grid indices using

int(floorf(p.{x/y/z}))

so the purpose of the previously mentioned rounding operation is to ensure the int(floorf(.)) advances it to the next grid.

njuffa · August 17, 2015, 5:56pm

For all platforms supported by CUDA, both GPUs and CPUs use IEEE-754 floating-point formats. Therefore you should be able to simply

#include <float.h>

and use the FLT_EPSILON defined by the host compiler’s header file. That is the mechanical part. I am not exactly sure what your computation is trying to accomplish, so I cannot judge whether the above computation when used with FLT_EPSILON will provide the desired results.

Maybe you could give a more detailed specification of the desired functionality. I understand that the boundaries of each cell are on integers. Can points ever lie on a cell boundary? Looking at the one-dimensional case, what should the result of the computation be for the following three basic cases (where ‘x’ and ‘delta’ are floating-point numbers and ‘n’ is an integer representing the lower bound of the interval defining the cell in that dimension:

[1] x == n // point coincides with lower bound
[2] x == (n+1) // point coincides with upper bound
[3] x == (n+delta) // point somewhere inside the interval, that is ‘delta’ is in (0,1)

To increment a floating-point number by the smallest possible finite quantum, I would suggest using the C/C++ standard function nextafter(x,y) provided by CUDA. Starting at ‘x’, this returns the next representable floating-point number in the direction of ‘y’. Commonly used choices for ‘y’ are zero, negative infinity, positive infinity.

FangQ · August 17, 2015, 6:25pm

njuffa:

For all platforms supported by CUDA, both GPUs and CPUs use IEEE-754 floating-point formats. Therefore you should be able to simply
#include <float.h>
and use the FLT_EPSILON defined by the host compiler’s header file. That is the mechanical part. I am not exactly sure what your computation is trying to accomplish, so I cannot judge whether the above computation when used with FLT_EPSILON will provide the desired results.

this is the puzzling part. the IEEE-754 defined FLT_EPSILON is around 1.19209290e-7F, but the EPSILON found through trial-and-error in my cuda code was 2e-6. I can’t explain why they are different. is that because I multiplied it by -1.f or 1.f computed through the following term?

((((float*)v)[id] > 0.f)-0.5f)*2.f

FangQ · August 17, 2015, 6:41pm

here I have a simple example: let’s set p={20.5,20.5,0.5} with a dir vector v={-1,0,0}. To perform a ray-grid ray-tracing by comparing the distances to x/y/z planes, I found the first intersection point at pn={20,20.5,0.5}.

if I return pn directly without changes, my next ray-tracing test will be a problem, because it will hit the exactly the same location {20,20.5,0.5} again because its distance along v is the shortest, i.e. 0.0.

to avoid infinite loop, I’d like to move pn slightly over the intersection so that next time, it will find {19,20.5,0.5}, and so on. The problem I am struggling is by how much I should change pn.

njuffa · August 17, 2015, 7:08pm

The FLT_EPSILON from the header file is correct, and matches what you can measure by actually computing it on the GPU:

#include <stdio.h>  // import printf()
#include <stdlib.h> // import EXIT_SUCCESS
#include <math.h>   // import nexafter()
#include <float.h>  // import FLT_EPSILON

__global__ void eps_kernel (void)
{
    printf ("GPU computed FLT_EPSILON = %15.8e\n", nextafter (1.0f,2.0f) - 1.0f);
    printf ("header file FLT_EPSILON  = %15.8e\n", FLT_EPSILON);
}

int main (void)
{
    eps_kernel<<<1,1>>>();
    cudaDeviceSynchronize();
    return EXIT_SUCCESS;
}

The output of this program is:

GPU computed FLT_EPSILON = 1.19209290e-007
header file FLT_EPSILON  = 1.19209290e-007

FangQ · August 17, 2015, 9:06pm

thanks for confirming the machine precision on the GPU.

here I have one update, I changed my line to

((float*)p)[id]=nextafterf(__float2int_rn(((float*)p)[id]), ((float*)p)[id]+((((float*)v)[id] > 0.f)-0.5f));

after your suggestion, it now works without failed tests.

I still don’t understand why my previous approach failed though. perhaps epsilon multiplied by something is not guaranteed valid?

Topic		Replies	Views
Floating-point precision problems CUDA Programming and Performance	14	4414	January 7, 2011
Rounding Errors CUDA Programming and Performance	7	2001	February 18, 2013
wrong floating point operation results CUDA Programming and Performance	6	6097	February 14, 2010
Floats and floats... difference between CPU and GPU? CUDA Programming and Performance	12	14156	February 2, 2010
discrepancy between CPU and GPU after a division (accuracy issue) CUDA Programming and Performance	3	1510	June 10, 2015
Possible Rounding/Precision Errors in CUDA Math APIs? GPU-Accelerated Libraries math-api	5	142	July 31, 2024
Floating Point Accuracy CUDA Programming and Performance	11	30431	April 6, 2013
round-to-double on GPUs? precision issue on the GPU CUDA Programming and Performance	10	10393	November 17, 2009
precision CUDA Programming and Performance	3	2614	December 16, 2008
Help Needed: Precision Mismatch between GPU and CPU Calculations of AAD Limiter CUDA Programming and Performance	3	42	December 23, 2024

Floating point round-off limit (epsilon)

Related topics