Floating point round-off limit (epsilon)

I am working on a ray-tracing-in-grid-space code. for every step, I need to propagate my particle to the first intersection to the enclosing voxel, and ensure that the particle is out-side of the voxel (so, when taking floorf(x/y/z) it will be a different integer).

I am having difficulty for the second goal, i.e., make sure the coordinate is across integer boundaries.

here is my code snip

#define EPS 1e-6f

// float3 p is the particle position, float3 v is the velocity vector,
// id is the index of the first intersection plane id={0,1,2}: passing grid in {x,y,z}-dir

// at this point, p is already updated and should be located ON the boundary
// the next line ensures it is across the boundary

((float*)p)[id]=__float2int_rn(((float*)p)[id])+((((float*)v)[id] > 0.f)-0.5f)*2.f*EPS;

I found that, when I set the scaling factor before EPS to 2.f as above, the code may fail, but if I set it to 4.0, it works for most particles I tested.

I am wondering how to do this rigorously? is there an FLT_EPSILON for float in CUDA?

any other more efficient ways?

thanks

by the way, the p.{x,y,z} are used later by converting to integer grid indices using

int(floorf(p.{x/y/z}))

so the purpose of the previously mentioned rounding operation is to ensure the int(floorf(.)) advances it to the next grid.

For all platforms supported by CUDA, both GPUs and CPUs use IEEE-754 floating-point formats. Therefore you should be able to simply

#include <float.h>

and use the FLT_EPSILON defined by the host compiler’s header file. That is the mechanical part. I am not exactly sure what your computation is trying to accomplish, so I cannot judge whether the above computation when used with FLT_EPSILON will provide the desired results.

Maybe you could give a more detailed specification of the desired functionality. I understand that the boundaries of each cell are on integers. Can points ever lie on a cell boundary? Looking at the one-dimensional case, what should the result of the computation be for the following three basic cases (where ‘x’ and ‘delta’ are floating-point numbers and ‘n’ is an integer representing the lower bound of the interval defining the cell in that dimension:

[1] x == n // point coincides with lower bound
[2] x == (n+1) // point coincides with upper bound
[3] x == (n+delta) // point somewhere inside the interval, that is ‘delta’ is in (0,1)

To increment a floating-point number by the smallest possible finite quantum, I would suggest using the C/C++ standard function nextafter(x,y) provided by CUDA. Starting at ‘x’, this returns the next representable floating-point number in the direction of ‘y’. Commonly used choices for ‘y’ are zero, negative infinity, positive infinity.

this is the puzzling part. the IEEE-754 defined FLT_EPSILON is around 1.19209290e-7F, but the EPSILON found through trial-and-error in my cuda code was 2e-6. I can’t explain why they are different. is that because I multiplied it by -1.f or 1.f computed through the following term?

((((float*)v)[id] > 0.f)-0.5f)*2.f

here I have a simple example: let’s set p={20.5,20.5,0.5} with a dir vector v={-1,0,0}. To perform a ray-grid ray-tracing by comparing the distances to x/y/z planes, I found the first intersection point at pn={20,20.5,0.5}.

if I return pn directly without changes, my next ray-tracing test will be a problem, because it will hit the exactly the same location {20,20.5,0.5} again because its distance along v is the shortest, i.e. 0.0.

to avoid infinite loop, I’d like to move pn slightly over the intersection so that next time, it will find {19,20.5,0.5}, and so on. The problem I am struggling is by how much I should change pn.

The FLT_EPSILON from the header file is correct, and matches what you can measure by actually computing it on the GPU:

#include <stdio.h>  // import printf()
#include <stdlib.h> // import EXIT_SUCCESS
#include <math.h>   // import nexafter()
#include <float.h>  // import FLT_EPSILON

__global__ void eps_kernel (void)
{
    printf ("GPU computed FLT_EPSILON = %15.8e\n", nextafter (1.0f,2.0f) - 1.0f);
    printf ("header file FLT_EPSILON  = %15.8e\n", FLT_EPSILON);
}

int main (void)
{
    eps_kernel<<<1,1>>>();
    cudaDeviceSynchronize();
    return EXIT_SUCCESS;
}

The output of this program is:

GPU computed FLT_EPSILON = 1.19209290e-007
header file FLT_EPSILON  = 1.19209290e-007

thanks for confirming the machine precision on the GPU.

here I have one update, I changed my line to

((float*)p)[id]=nextafterf(__float2int_rn(((float*)p)[id]), ((float*)p)[id]+((((float*)v)[id] > 0.f)-0.5f));

after your suggestion, it now works without failed tests.

I still don’t understand why my previous approach failed though. perhaps epsilon multiplied by something is not guaranteed valid?