Any suggestions for detecting intersections within a customized region defined per ray?

Hi,

I am exploring general applications of RT cores. Given a large set of points in space, I aim to determine whether two points can be connected. The criterion is that no other points lie within the intersection region of the two circles centered at the two points, as illustrated in the figure below, where I want to determine if point A and B can be connected. The radius of both circles is equal to the distance between the two points.

A basic idea is to launch a short ray from point A and construct a sphere centered at point C to cover point A, so that point C can be detected by checking whether an intersection occurs. However, this approach makes it difficult to identify points outside the intersection region.

Another approach is to construct spheres centered at A and B with radius equal to the distance between two points and cast rays from all other points. If a ray has two intersections, it indicates that there are other points within the intersection region. However, this method requires frequently updating the sphere radius, since each pair of points has a different distance, and it also involves a large number of ray–sphere intersection tests.

Thanks!

Hi @whix, sounds like fun research!

It might currently be somewhat difficult to do the kinds of tests you’re hoping for when using RT cores. The RT cores are designed for ray-surface intersection testing, and the Blackwell hardware accelerated sphere “LSS” intersections currently only return the entry point hit and not the exit point. If your ray does not cross from outside to inside the sphere, then the ray won’t report an intersection.

For example, if the ray starts and stops inside a sphere (via tmin & tmax) then the ray will be considered a “miss”. This applies to your diagram in the sense that casting a ray from A to B will miss a sphere centered at point C that has a radius of ||A-C|| or greater. It is possible that, due to floating point precision, if the surface of the sphere centered on C passes through A, and the ray is cast from A, that you might hit the sphere after a small epsilon t. But you should expect that the ray will miss such a sphere because mathematically, the expected t value is 0, and RT cores do not return hits with t==0; the t value of a hit must be strictly greater than zero.

You can, of course, write your own sphere intersector that will report hits when exiting the sphere, or when starting & stopping inside of a sphere, or when t==0, but doing that will obviously trade away some of the performance benefits of using RT cores.

Are you looking for a KNN (K nearest neighbors) query, in general? It might be possible in the future that RT cores can support other query types, if there’s enough demand for it. In the mean time, it might be worth exploring CUDA BVH builders that support KNN or other query types. One example of that is the CuBQL library. There has also been a set of academic papers exploring how to use RT cores to emulate such queries. I can give some links if you’re interested, if you haven’t already explored the related literature.

Thank you for your reply.

What I am trying to detect is whether points exist within a given intersection region; this serves as the criterion for determining whether two points can be connected by an edge. I have explored using RT cores for kNN search, but that approach is not necessary for my case.

My initial idea is to implement my ray–sphere intersection test, as commonly used in many RT-based kNN studies. However, since RT cores do not support such operation, i.e., detecting points in a customized region, I was wondering if you could recommend any CUDA-based libraries that could help achieve this functionality.

I see no need to use ray tracing for this specific algorithm at all because the condition of a point C lying inside the intersection space of two spheres around points A and B with the radii of the distance between point A and B can be easily done by comparing the squared distances.

If your request is to determine that “connectivity” between all points A and B then it would be something like this pseudo algorithm (not tested, just to demonstrate that this doesn’t need raytracing).

// Number of points N.
// Minimum for the below loops to work at all is N == 2, though 2 is the trivial unblocked case.
const int N = 100;

float3 points[N];

fill_points_with_positions(N, points);

for (int indexA = 0; indexA < N - 1; ++indexA)
{
  const float3 A = points[indexA];

  // Due to symmetry (order of A and B) the result size is an upper triangle matrix without the diagonal, means (N * (N - 1)) / 2 elements.
  // This could be your launch size of a native CUDA kernel to calculate the blocked condition between any two points A and B, which is the result of the loop over indexC.
  // The squared distances between any two points should be precalculated in another native CUDA kernel.
  for (int indexB = indexA + 1; indexB < N; ++indexB)
  {
    const float3 B = points[indexB];

    // Vector from A to B.
    const float3 AB = B - A;
    // No need for sqrtf(), comparison works with squared values as well. Precalculate these.
    const float distance_squared_AB = dot(AB, AB);

    bool isBlockedAB = false;

    // Now the smart thing to do is not to need to look at all N indices here,
    // but use some simple spatial structure like a regular 3D cell grid which stores the indices of the points inside that cell as some list.
    // Then first determine which cells are touched or enclosed by the intersection space of the sphere around A and B, and only check the points C inside the potential cells instead of all N.
    // One idea to trivially reject outside cells would be testing against the half-spaces at the planes perpendicular to the AB resp. BA vectors at footpoints A and B, basically some binary space partition (BSP) test.
    // Also there is no need to test points C at all if a cell (all 8 corners) with any point data is fully enclosed by the spheres' intersection region. Connectivity is blocked then.
    // Voila, an acceleration structure for this simple algorithm which can be implemented in a few native CUDA kernels easily. Zero raytracing required.
    for (int indexC = 0; indexC < N && !isBlockedAB; ++indexC)
    {
      if (indexC != indexA && indexC != indexB)
      {
        // Vector from A to C.
        const float3 AC = C - A;
        // Vector from B to C.
        const float3 BC = C - B;

        // Is point C inside the intersection of the spheres around A and B with radius of distance between A and B?
        if (dot(AC, AC) < distance_squared_AB &&
            dot(BC, BC) < distance_squared_AB)
        {
          // No connection between A and B possible.
          isBlockedAB = true;
          break;
        }
      }
    }

    // Store the valid connection result somewhere.
    // Mind only requires (N * (N - 1)) / 2 elements.
    store_result(N, indexA, indexB, isBlockedAB);
  }
}

Now the million dollar question is what is “a large set of points” in your problem? How many points are we talking about?
The simple host algorithm above has an O(n^3) complexity and it would be absolutely necessary to reduce the number of tests inside the loop over indexC as much as possible.

Writing that loop as a native CUDA kernel would give you plenty of parallel tests if your “large” number of points saturates the GPU.

The point set scales from 1M to 10M points.

For point filtering, I think we could also construct an AABB to enclose the intersection region.

How often do you want to “construct an AABB to enclose an intersection region” though?

It’s not feasible to do that for each pair of points A and B. That coarse structure should be built only once and used for all tests. I still think the easiest way to prune as many points as possible is the BSP idea on the coarse grid, which only needs the distance of the grid corners from two planes to trivially reject whole cells. Additionally you could test the remaining cells against a sphere around the center between A and B with the radius enclosing the concave intersection space.

The remaining number of coarse cells would vary greatly with the distance between A and B. If they are far away, many cells would remain, but then the loop over indexC would break early. If the radius is small, a lot more cells would be pruned early.

You would also only need to consider cells with points in them. You could also try a hierarchical structure like a BVH or octree if your points are placed in spatial clumps. A kD-tree might actually work nicely as well because that splits by half-spaces in each level and allows to find nearest neighbours. In any case, the idea is to make the coarse tests as cheap and efficient as possible, means a lot smaller than the number of points in your scene.

That can all be experimented with on the CPU first, before moving it to the GPU as native CUDA kernels.