I would like to know why "miss" is occurring


I have a question about something I’m implementing.

First, I generate one ray for each pair of triangles and record the index of the hit triangle.

The method for generating rays is as follows:

  • Take the centroid of a specific mesh as the origin of the ray.
  • Subtract the ray’s origin from the centroid of another mesh and then normalize this vector to determine the direction of the ray.

I’m working with the following shapes:

I create a ray from the centroid of a specific mesh (index 18) to the centroid of another mesh and check for hits.

The execution results are shown in the table.

What I want to obtain is the mesh indices 20, 21, 22, and 23.
I understood why I get hits and mesh indices 20, 21, 22, 23 from our previous discussion.
: How to record hits while traversing a mesh? - #9 by dhart

However, I don’t understand why I get misses and mesh index 18.

In particular, 14 is a miss, but 15 results in index 18, and I don’t understand why.

Is there something I’m overlooking?

Thank you.

Some questions about your algorithm:

1.) Are your meshes considered to be solid?
That is, should triangle 18 and 10 be visible to each other or is a ray going through the the mesh volume considered a miss?
Meaning is your mesh object a closed volume with consistent triangle winding to define the front faces (that is , the outside of the surface) correctly and considered solid, so that visibility should only be started on the front faces of triangles?
Figure 2 in your previous post and the result table above seems to indicate otherwise. Is that really what you want?

2.) Are you doing any self-intersection avoidance when connecting the two triangles with a visibility ray?
That is, are you preventing that the triangle (e.g. 18) you’re starting the visibility ray from is hit itself by offsetting the ray origin away from the triangle surface to the correct side, or by offsetting the tmin by a positive epsilon value (or more robust but expensive by checking the primitive ID inside an anyhit program)?

3.) Similar for the destination sample point on the other triangle (e.g. 10).
If you’re checking the visibility between two geometric primitives, are you limiting the ray [tmin, tmax] interval to the distance between the two sample points on the two triangles tested for visibility?
(Or is your ray supposed to actually hit the destination triangle? Don’t do that, that would be slower.)
In case the visibility ray should only check if anything is between the two triangle sample points, then the ray tmax needs to be offset as well, this time with a negative epsilon to not result in the destination triangle to be hit and be considered as blocking the visibility test.
(This might be why 14 is a miss and 15 is not. It’s a floating point accuracy thing.)

Again, if you need to determine the condition of “any visibility” between two triangles on the surface of a mesh, your current algorithm connecting the centroids of triangles will only work with such simple geometry as in the pictures above. It will not work for the Stanford bunny or the duck you used as other examples.
For those you need to sample a lot more points between each triangle pair to make sure you resolve partial visibility conditions, like David explained before.

The fastest visibility ray implementation you can do in OptiX is described here:

It should be rather straightforward to implement. It just needs a lot of rays, though these are the fastest kind.


Thank you for the quick response.

  • The mesh is considered solid, so rays should not pass through the mesh.
  • The mesh object is treated as a closed volume, and visibility is only applied to the front faces of the triangles. However, I conducted tests to see what happens when light is cast on the back faces of the triangles, as shown in the figure.
  • What I want is the result table from the previous post, and I asked the question to understand why the current figure shows the results it does.
  • I’m not offsetting the ray origin on the triangle surface.
  • I’m only using closesthit, and I do acquire the mesh primitive ID.
  • I plan to perform the same operation for the other triangle (e.g., 10) as well.
  • I’m not limiting the ray interval.
  • The ray actually hits the target triangle. What method should I use? Why is it slow?
  • The goal is to check if a ray hits a triangle and pair up triangles.
  • I’m set tmax to 1e16f
  • What does the offset of tmin and tmax mean? I don’t quite understand. I interpret it as the starting and ending points of the ray, is that not correct?

Thank you.

The mesh is considered solid, so rays should not pass through the mesh.

Good, so all rays are only started into the upper hemisphere over the front faces of triangles.

I’m not offsetting the ray origin on the triangle surface.

That’s bad. That means you’re not preventing self intersections and that would explain the unexpected miss results.

The ray actually hits the target triangle. What method should I use? Why is it slow?

That is not required when testing the visibility between two sample points on two different triangles.
You already know which two triangles these are from the construction of the ray direction.
All you need then, is to test if anything is blocking the ray between the two sample points.
For that you shoot a “visibility ray” which only needs to cover the distance between the two sample points.

So the visibility check would basically be this pseudo algorithm:

// Inside your ray generation program:
// ...
float3 ray_origin    = sample_point_on(source_triangle);      // Uniformly distributed random sample point on source triangle
float3 ray_end_point = sample_point_on(destination_triangle); // Uniformly distributed random sample point on destination triangle
float3 vector        = ray_end_point - ray_origin;            // Unnormalized Vector from start to end sample point.
float  ray_tmax      = length(vector);                        // Distance between the two sample points in world coordinates. This will define ray.tmax.
// TODO: Add a check here to make sure that ray_tmax is greater than some epsilon to not divide by zero in the following normalization.
//       This could actually happen for adjacent triangles!
float3 ray_direction = vector / ray_tmax;                     // Normalized direction vector from start to end sample point. 

// This is going to be the only ray payload register and the result of the visibility test.
// Initialize the visibility result to 0 (false)
// When reaching the miss shader of the visibility ray, that means there was no blocking object between the two sample points and isVisible is set to 1 (true).
unsigned int isVisible = 0;

// Check if the ray direction is inside the upper hemisphere of the start triangle.
// Actually there wouldn't be need to normalize this for the hemisphere side check.
float3 source_face_normal      = calculate_face_normal(source_triangle);
float3 destination_face_normal = calculate_face_normal(destination_triangle);

// If both the ray direction and the face normal point into the same hemisphere,
// the destination triangle lies inside the half-space over the start triangle's front face
// and vice versa! Means this only checks visibility between the front faces of triangles.
if (dot(ray_direction, source_face_normal)      > 0.0f && 
    dot(ray_direction, destination_face_normal) < 0.0f)
  // Note that the sysData.sceneEpsilon is applied on both sides of the visibility ray [t_min, t_max] interval 
  // to prevent self-intersections with either geometric primitive.
             start_point, ray_direction, // origin, direction
             0.0f + sysData.sceneEpsilon, ray_tmax - sysData.sceneEpsilon, 0.0f, // tmin, tmax, time
             0, 0, TYPE_RAY_VISIBILITY, // The visibility ray type only uses the miss program!
             isVisible); // The ray payload is only this one unsigned int register.
// else the ray direction points into the object on either the source or destination triangle and isVisible remains 0 (false)

// Here isVisible contains either 0 or 1 and that is the result for the visbility between source_triangle and destination_triangle indices.

Note that some values in the code above won’t change like the face_normals and could be calculated upfront and stored a per triangle inside some additional input buffer.

Also mind that adjacent triangles might have sample points which are closer together than the two sceneEpsilon offsets which try to prevent self intersections. That would also need to be checked because tmin > tmax is an invalid ray error.

There are more robust self intersection avoidance methods. One is described inside the OptiX-Toolkit repository. https://github.com/NVIDIA/otk-shader-util
Other methods require no offset at all but check the source and destination triangle indices inside an anyhit program which is slower.

Now the question is how to implement the data management and results.

The above is a gather algorithm, means you know upfront between which two triangles the visibility is tested.
That means in a matrix of {triangles * triangles} many cells and because the result is symmetric, the result vector is the upper-right triangle without the diagonal elements, which hold the identical triangle indices.

When each optixLaunch should calculate one visibility result between two different triangles, that is the upper-right triangle in that matrix, the launch dimension would need to be width = number_of triangles * (number_of triangles - 1) / 2; height = 1; depth = 1;

Now each optixLaunch would calculate one result in that upper-right triangle of the connection matrix which must be cleared to zero before or in the very first launch, so that the isVisible result only needs to be written into the output buffer when the result is 1.
So if after a number of launches the result vector contains a 1, at least one visibility test between those two triangles succeeded.
That could also be used to prune the number of calculations and only continue checking triangle pairs which aren’t already passing the visibility test if you only need a binary condition.
Or you could count the visibility results by adding the isVisible result to its output buffer cell in case you want to determine partial visibility.

Note that for simple objects, the resulting launch dimension might be much too small to saturate modern GPUs.
On the other hand there is an upper limit for the optixLaunch dimension which is 2^30.
So if your result vector gets too big, you would need to partition the work into multiple optixLaunches somehow.
There are also other methods to store the results, for example in a bit vector.
Somehow related topic here: https://forums.developer.nvidia.com/t/whats-your-solution-to-get-all-hit-primitives-of-multiple-rays/239528

This visibility ray method is faster than shooting rays with longer lengths because the acceleration structure traversal doesn’t need to consider potential hits outside the ray’s [tmin, tmax] interval between the two triangles.

As a sidenote, simply shooting a huge number of rays from the triangle’s front faces and checking if that hit anything like it’s done for ambient occlusion would be less robust because you wouldn’t know how dense the ray distribution would need to be to actually hit small triangles, and slower because that would need to use longer tmax values.


Thank you for your kind reply

It gave me some pseudocode so I knew I was on the right track.

I also learned more about what I had missed and what I needed to consider.

However, I still don’t understand the explanation of the offset of the ray origin.
Can you explain in detail?
Is the picture below correct?

Thank you

Self-intersection avoidance is done to prevent intersections of a ray starting from a geometry surface which is most often done for shadow/visibility tests. When not doing that, there can be shadow artefacts where shadows appear where there shouldn’t be any. The effect usually look like rings with Moiré patterns. (This is not the same as depth bleeding from coplanar surfaces!)

Have a look at this blog post:
or search for “shadow acne” here:

These effects can also easily be seen in any of my OptiX 7 examples when setting the scene epsilon factor to zero inside the GUI:

Your image is correct (there should be tmin and tmax for each yellow ray direction) and is one way of preventing self-intersection on the starting surface. Offsetting the origin along the front face normal like that is more robust than offsetting the ray tmin value (like done in the example code above) because latter depends on the ray direction, where more horizontal directions would require a bigger offset.

Either method is very scene size dependent since it all happens due to the floating point accuracy of the involved coordinates. That’s why there is this self-intersection avoidance helper library inside the OptiX Toolkit I linked to above which tries to make the necessary offset as small as possible.

For visibility tests between two surfaces, that self-intersection avoidance must be done on both source and destination surfaces to prevent intersections with either, resulting in false visibility test failures. Such visibility test rays must only find blocking geometry between the source and destination surface. That’s why ray tmax is shortened inside the optixTrace arguments of the example code.

Mind the TODO I mentioned inside the code snippet above. The interval [tmin, tmax] could be really small when sampling random locations on adjacent triangles and invalid rays must be prevented.


Your answer helped me a lot.

Is it appropriate to use a loop when running optixLaunch multiple times?
There may be around 1,000,000 meshes.
I want to use GPU memory within approximately 5GB (which may increase later) and aim for the fastest possible execution.
Approximately 5GB is equivalent to 73,200x73,200 bytes.

I attach the code below.


Thank you

Yes, it’s very usual to call optixLaunch in a loop to handle heavy workloads in smaller chunks.
(Most progressively accumulating renderers do that for the samples per pixel number.)

If I interpret your launch dimension width calculation correctly, you’re shrinking the launch dimension with each launch. The last launch dimension is (1, 1, 1) which is just a single thread and not doing any parallel work on the GPU.
Assuming the meshesCount variable indicates the number of triangles inside the scene, each launch would be one row of the upper-right connection matrix results I mentioned above.

Given the amount of results for 1,000,000 triangles, that would create roughly 10^12 / 2 = 500 Gig values, it’s reasonable to calculate this in smaller chunks, even when representing the visibility as single bit, where the starting row with a million connections is a reasonable workload and only the last tens of thousands launches would not saturate modern high-end GPUs.
From a simplicity standpoint, it’s worth a first try with that.

Your code is not showing how the data is read back. If that is done with a synchronous cudaMemcpy, there wouldn’t need to be a need for the explicit synchronization call done inside that CUDA_SYNCH_CHECK macro.

This loop cannot run fully asynchronously anyway because you’re using only one host parameter block as source and need to wait until the cudaMemcpyAsync actually gets executed on the GPU before changing the value on the host. You would need to hold separate host source values to make that fully asynchronous.

Note that you would only need to copy the whole full LaunchParamsMSH structure once outside the loop if all you change inside the loop is the startIndex value inside that struct. That value could be copied into the device side struct individually.

Means this is all fine so far, it’s just not perfectly optimal.

There would be ways to solve the unbalanced launch dimension sizes with a fixed launch size and more logic inside the ray generation program to be able to calculate the two triangle indices for each launch index in each launch. That would require to know the launch number and the number of triangles to be able to map the current launch index into the 2D cell coordinate of the upper-right connection matrix of all triangles.


Thank you for your response.

There are some things in the explanation that I don’t know how to write the code.

In order to be completely asynchronous, you said to have a separate host source value.
Doesn’t that require OptixLaunch() to be 2-dimensional or 3-dimensional?
If so, additional work would be required to check the available memory of the GPU and allocate 2-dimensional or 3-dimensional memory.

Additionally, only “startIndex” is changed within the loop.
So, you said to copy the entire thing from the outside only once.
In that case, I think you need to run OptixLaunch() in two dimensions to be able to know the starting index inside the ray_gen function.
Is that correct?

Thank you

Forget about the fully asynchronous launches. That won’t work in your case anyway since your result vector is too big to calculate everything at once and the copy of the results from device to host requires a synchronization anyways.

That has nothing to do with 1D, 2D or 3D launch dimensions. I just meant that you can send multiple asynchronous CUDA calls, which includes the optixLaunch, into the same stream and the GPU would exercise these copies and kernel launches in order as quickly as possible asynchronously to the CPU, and that means you wouldn’t know when the asynchronous cudaMemcpy for the launch parameter happens so you cannot use a single source for that, but would need to have an array of source values on the host with one entry for each launch when you’re not synchronizing anything.

Additionally, only “startIndex” is changed within the loop.
So, you said to copy the entire thing from the outside only once.
In that case, I think you need to run OptixLaunch() in two dimensions to be able to know the starting index inside the ray_gen function.
Is that correct?

No. What I meant, is that you could do one cudaMemcpy() outside the loop which copies your whole launch parameter structure once to initialize all non-changing values on the device, and then only copy the 4 bytes of the unsigned int startIndex value to the correct device address inside that structure before each optixLaunch.

Here is some example code where either the whole launch parameter structure (when it’s dirty) or only one value, the frame’s sample per pixel “iterationIndex”, is copied to the device using the CUDA Driver API, but the same works with the CUDA Runtime API:

That is not the main problem in your algorithm’s efficiency. The shrinking launch dimensions are.

Then I still don’t understand why you seem to insist on testing only the visibility between the center points of two triangles. Is that really sufficient? That won’t determine partial visibility of two triangle areas.

Why exactly do you need to test the visibility between two triangles?


I’ve successfully implemented a basic program with your help, and it works well for simple models. However, I encountered three issues as the complexity of the model increased:

( 1 )

  • figure1

In the situation depicted in Figure 1, ray originates from A. the hit triangle are shown below.

  • figure2

Triangle B is located inside the object(Figure2).

Rays originating from A towards B do not pass through the cylindrical object, but the closest hit result for these rays is triangle B.
Why is this happening?
How does closest hit work?

( 2 )
When I specify the length(direction vector) of rays fired from B towards A, no hits occur.
However, if I set the ray length to 1e16, hits do occur.
The calculations seem correct, but could there be something I’m overlooking?
Could this be related to floating-point precision issues?


This is information about A and B.
o Position
A : (-9.8896, 487.8165, 765.00) (-9.8896, 487.8165, 1680.00) (-14.2096, 472.2060, 1680.00)
B : (1207.8165, 620.00, 709.0977) (1195.8649, 620.00, 717.3117) (1195.8649, 895.00, 717.3117)

o Normal
A : (0.963776, -0.266713, 0.000000)
B : (-0.566406, 0.000000, -0.824127)

o Direction vector
Ray origin : (-11.3296, 482.6130, 1375.00)
‘B’ center point : (1199.848877, 711.666687, 714.573730)
Direction vector : (0.866104, 0.163794, -0.472266)
Direction vector length(= tmax) : 1398.421265

This is the function pseudocode.

	//ray_gen() function 
		const unsigned int originIdx = GetIndex(origin);		// Obtain the starting mesh index 
		const unsigned int targetIdx = GetIndex(target);		// Obtain the target mesh index 
		const float3 originNormal = GetNormal(origin);		// Starting mesh normal vector 
		const float3 targetNormal = GetNormal(target);		// Target mesh normal vector 

		// Obtain the direction vector and ray length 
		const float3 rayDirection = normalize(GetPos(target) - GetPos(origin));
		const float tmax = length(rayDirection);

		// Ray generation location 
		const float3 originOffset = rayDirection * epsilon;
		const float3 rayPos = GetPos(origin) + originOffset;

		if (0.0f < dot(rayDirection, originNormal) && dot(rayDirection, targetNormal) < 0.0f && 0.0f < tmax)
			unsigned int u0 = 0;			// output only 
			unsigned int u1 = targetIdx;			// input only 
				0.00f,         // tmin 
				tmax, // tmax : default 1e16f 
				0.0f, // rayTime 
				OpX::RAY_TYPE_RADIANCE, // SBT offset 
				1, // SBT stride : RAY_TYPE_COUNT 
				OpX::RAY_TYPE_RADIANCE, // missSBTIndex u0, u1); 
			payload.hit = __uint_as_float(u0);
			payload.hitPrimitiveIdx = __uint_as_float(u1); // Record hit & miss results 

	// closest hit() function 
		const unsigned int primitiveIdx = optixGetPrimitiveIndex();
		const unsigned int targetIdx = OpX::getPayloadLauchID();
		if (primitiveIdx == targetIdx)
			// Record hit


( 3 )

Figure 3 is a screen displayed by reading a custom mesh file other than a gltf file and showing it using optixMeshViewer.
It comes out differently than expected.
Did I fail to make AS?

First of all, you’re not implementing what I described.

1.) Your tmax calculation is incorrect and not consistent with what you wrote:
You printed Direction vector length(= tmax) : 1613.840332
But in your code length is always 1.0f.

const float3 rayDirection = normalize(GetPos(target) - GetPos(origin));
const float tmax = length(rayDirection);

Compare that to my example code.

float3 vector        = ray_end_point - ray_origin;            // Unnormalized Vector from start to end sample point.
float  ray_tmax      = length(vector);                        // Distance between the two sample points in world coordinates. This will define ray.tmax.
// TODO: Add a check here to make sure that ray_tmax is greater than some epsilon to not divide by zero in the following normalization.
//       This could actually happen for adjacent triangles!
float3 ray_direction = vector / ray_tmax;                     // Normalized direction vector from start to end sample point.

Maybe that was already the whole problem, but there are other things I would change:

2.) You wanted to offset the origin along the face normals of the two triangles, but you offset it along the ray direction, which should also work, but is less robust. When doing that you can also simply offset the tmin which would have been faster.

That epsilon offset is is dependent on the scene size. Depending on your scene extents your epsilon might not be big enough.

And again, if you used a visibility ray implementation like I explained multiple times now, it would be required to offset the ray at both ends of its test interval, so the same calculation would need to happen at the target triangle and only then the ray direction would need to be calculated between the offset points.

Or you do it along the ray direction by changing the tmin and tmax values given to optixTrace like in my example code.

3.) No closest hit is required if you followed my example code. That was disabling anyhit and closesthit program altogether because the visibility ray can be implemented with just a miss program as shown inside in my very first posted reply of this thread.

extern "C" __global__ void __miss__shadow()
  optixSetPayload_0(1); // isVisible != 0

The OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT is not going to result in the closest hit anyway. Instead that is terminating the ray traversal on any hit inside the ray’s tmin, tmax interval. If there is no hit, the miss program will be reached and that is where you set the isVisible flag.

The construction of the visibility ray already knows which two triangles are tested. There is no need to test against either of the two involved triangles inside some closest hit program.
(If you wanted to implement a more robust self intersection test by testing the hit primitives, that would have required an anyhit program to be able to ignore the two involved triangles whoch would slow down the algorithm considerably and shoudn’t be required. Don’t do that Try to get this to work as I explained first.)

4.) I’m not sure what your Figure 3 should be showing but although you’re saying that your geometry triangles are wound consistently (counter-clockwise is front face) to result in front faces being always outside, have you verified that, like by visualizing the face normals?
Because none of the ray offseting or face normal comparisons will work if the triangle winding is inconsistent.

5.) I would also try without the OPTIX_RAY_FLAG_CULL_BACK_FACING_TRIANGLES ray flag.
That wasn’t part of my example code either.

Please just implement the visibility ray exactly as I described and do the above consistency checks of your geometry. That should all work and is really rather simple if you just follow my advice.

Thanks for your reply.

There was an error in the pseudocode. As mentioned, tmax and ray offset are configured.

By adjusting the flags I was able to get better results than before. But it’s not perfect.

I have a question about using only the Miss program you suggested. Wouldn’t a hit occur if another triangle obscures it?
What I want is miss.
If a miss occurred, it was different from the miss operation I was thinking of.


I would like to treat the ray shooting from (a) towards (c) as a miss.

You need to distinguish the OptiX program domain names (raygen, anyhit, closesthit, miss, etc.) from your arbitrary choice of names for your results.

The visibility ray implementation I provided, tests if any geometry lies inside the ray interval [tmin, tmax].
The isVisible flag remains false (0), if anything is hit in that interval. Nothing else happens.
If nothing is hit inside the interval, the miss program is reached (== nothing is hit) and that sets isVisible = true (1).
Simple as that.

If you name the condition isVisible == false “miss” that is your choice.
That has nothing to do with the OptiX program domain names I’m talking about.
If you rename your “miss” result to “invisible” it might make things clearer.

So in your image shooting a ray from point (a) to point (c) with the visibility ray implementation I provided setup like this:

unnormalized_direction = c - a;

ray.origin = a;
ray.direction = normalize(unnormalized_direction);
ray.tmin = 0.0f + sceneEpsilon.
ray.tmax = length(unnormalized_direction) - sceneEpsilon;

will result in an isVisible condition which in that case will return false (0) because the triangle (b) is hit, so the miss program is not reached and the default initialization of isVisible = false (0) is not changed because nothing else happens.
No anyhit and no closesthit program exist for that visibility ray implementation and the ray flags OPTIX_RAY_FLAG_DISABLE_ANYHIT | OPTIX_RAY_FLAG_DISABLE_CLOSESTHIT even prevent that these could be called.

The OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT is a faster method to stop further traversal of the ray (instead of using an anyhit program with an optixTerminateRay call) because that ray flag will be evaluated by the RTX RT cores directly,.

This implementation of a visibility test between two points in space is the fastest you can implement in OptiX.
There is a full example using that method inside my OptiX Advanced Samples:

I hope it’s clear now. I can’t explain it simpler. Just implement that correctly and you’ll see.

Thanks for the detailed answer.
I understand.