The mesh is considered solid, so rays should not pass through the mesh.

Good, so all rays are only started into the upper hemisphere over the front faces of triangles.

I’m not offsetting the ray origin on the triangle surface.

That’s bad. That means you’re not preventing self intersections and that would explain the unexpected miss results.

The ray actually hits the target triangle. What method should I use? Why is it slow?

That is not required when testing the visibility between two sample points on two different triangles.

You already know which two triangles these are from the construction of the ray direction.

All you need then, is to test if anything is blocking the ray between the two sample points.

For that you shoot a “visibility ray” which only needs to cover the distance between the two sample points.

So the visibility check would basically be this pseudo algorithm:

```
// Inside your ray generation program:
// ...
float3 ray_origin = sample_point_on(source_triangle); // Uniformly distributed random sample point on source triangle
float3 ray_end_point = sample_point_on(destination_triangle); // Uniformly distributed random sample point on destination triangle
float3 vector = ray_end_point - ray_origin; // Unnormalized Vector from start to end sample point.
float ray_tmax = length(vector); // Distance between the two sample points in world coordinates. This will define ray.tmax.
// TODO: Add a check here to make sure that ray_tmax is greater than some epsilon to not divide by zero in the following normalization.
// This could actually happen for adjacent triangles!
float3 ray_direction = vector / ray_tmax; // Normalized direction vector from start to end sample point.
// This is going to be the only ray payload register and the result of the visibility test.
// Initialize the visibility result to 0 (false)
// When reaching the miss shader of the visibility ray, that means there was no blocking object between the two sample points and isVisible is set to 1 (true).
unsigned int isVisible = 0;
// Check if the ray direction is inside the upper hemisphere of the start triangle.
// Actually there wouldn't be need to normalize this for the hemisphere side check.
float3 source_face_normal = calculate_face_normal(source_triangle);
float3 destination_face_normal = calculate_face_normal(destination_triangle);
// If both the ray direction and the face normal point into the same hemisphere,
// the destination triangle lies inside the half-space over the start triangle's front face
// and vice versa! Means this only checks visibility between the front faces of triangles.
if (dot(ray_direction, source_face_normal) > 0.0f &&
dot(ray_direction, destination_face_normal) < 0.0f)
{
// Note that the sysData.sceneEpsilon is applied on both sides of the visibility ray [t_min, t_max] interval
// to prevent self-intersections with either geometric primitive.
optixTrace(sysData.topObject,
start_point, ray_direction, // origin, direction
0.0f + sysData.sceneEpsilon, ray_tmax - sysData.sceneEpsilon, 0.0f, // tmin, tmax, time
OptixVisibilityMask(0xFF),
OPTIX_RAY_FLAG_DISABLE_ANYHIT | OPTIX_RAY_FLAG_DISABLE_CLOSESTHIT | OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT,
0, 0, TYPE_RAY_VISIBILITY, // The visibility ray type only uses the miss program!
isVisible); // The ray payload is only this one unsigned int register.
}
// else the ray direction points into the object on either the source or destination triangle and isVisible remains 0 (false)
// Here isVisible contains either 0 or 1 and that is the result for the visbility between source_triangle and destination_triangle indices.
```

Note that some values in the code above won’t change like the face_normals and could be calculated upfront and stored a per triangle inside some additional input buffer.

Also mind that adjacent triangles might have sample points which are closer together than the two sceneEpsilon offsets which try to prevent self intersections. That would also need to be checked because tmin > tmax is an invalid ray error.

There are more robust self intersection avoidance methods. One is described inside the OptiX-Toolkit repository. https://github.com/NVIDIA/otk-shader-util

Other methods require no offset at all but check the source and destination triangle indices inside an anyhit program which is slower.

Now the question is how to implement the data management and results.

The above is a gather algorithm, means you know upfront between which two triangles the visibility is tested.

That means in a matrix of {triangles * triangles} many cells and because the result is symmetric, the result vector is the upper-right triangle without the diagonal elements, which hold the identical triangle indices.

When each optixLaunch should calculate one visibility result between two different triangles, that is the upper-right triangle in that matrix, the launch dimension would need to be `width = number_of triangles * (number_of triangles - 1) / 2; height = 1; depth = 1;`

Now each optixLaunch would calculate one result in that upper-right triangle of the connection matrix which must be cleared to zero before or in the very first launch, so that the isVisible result only needs to be written into the output buffer when the result is 1.

So if after a number of launches the result vector contains a 1, at least one visibility test between those two triangles succeeded.

That could also be used to prune the number of calculations and only continue checking triangle pairs which aren’t already passing the visibility test if you only need a binary condition.

Or you could count the visibility results by adding the isVisible result to its output buffer cell in case you want to determine partial visibility.

Note that for simple objects, the resulting launch dimension might be much too small to saturate modern GPUs.

On the other hand there is an upper limit for the optixLaunch dimension which is 2^30.

So if your result vector gets too big, you would need to partition the work into multiple optixLaunches somehow.

There are also other methods to store the results, for example in a bit vector.

Somehow related topic here: https://forums.developer.nvidia.com/t/whats-your-solution-to-get-all-hit-primitives-of-multiple-rays/239528

This visibility ray method is faster than shooting rays with longer lengths because the acceleration structure traversal doesn’t need to consider potential hits outside the ray’s [tmin, tmax] interval between the two triangles.

As a sidenote, simply shooting a huge number of rays from the triangle’s front faces and checking if that hit anything like it’s done for ambient occlusion would be less robust because you wouldn’t know how dense the ray distribution would need to be to actually hit small triangles, and slower because that would need to use longer tmax values.