Two questions: 1. payloadtype semantics 2. ray-triangle intersection

  • Payloadtype semantics
    I am using Optix-7.5 and trying this new feature to see if it can help cut down the register usage. In my code, I have

    1. One raygen program (where one OptixTrace passing 14 payloads is called iteratively in a loop);
    2. One closesthit program (where all the payloads are read and written);
    3. One miss program (where only one payload is written).

Therefore I add three semantics for the three programs respectively:

1. OPTIX_PAYLOAD_SEMANTICS_TRACE_CALLER_READ
2. OPTIX_PAYLOAD_SEMANTICS_CH_READ_WRITE
3. OPTIX_PAYLOAD_SEMANTICS_MS_WRITE

And update my programs to use

1. OptixTrace(OPTIX_PAYLOAD_TYPE_ID_0,...,p0,...p13) in the raygen program;
2. optixSetPayloadTypes(OPTIX_PAYLOAD_TYPE_ID_1) at the beginning of the closesthit program;
3. optixSetPayloadTypes(OPTIX_PAYLOAD_TYPE_ID_2) at the beginning of the miss program;

The program compiles fine. However, when I attempt to create the module, the optixModuleCreateFromPTX complains that OptixPayloadType::numPayloadValues must be less than or equal to 32: 711328048. My payloadType.numPayloadValues is 14 and I have changed the pipelineCompileOptions.numPayloadValues to 0 accordingly. I am not sure why I keep getting this error message. Please advise.

In addition, is there any example to show this new feature?

  • ray-traingle intersection
    In my program, I found that even with the tmin set to 0.0f in the OptixTrace call, occasionally the built-in ray-triangle intersection could miss a triangle that is very close to the rayOrigin. Is this due to the numerical error in the built-in ray-triangle intersection program? In addition, is there any built-in function that can tell us how many ray-triangle intersection are performed per unit time?

Thank you in advance.

Hi @Shijie, welcome!

Maybe double-check that all of your OptiX structs are zero-initialized. From the description, I’m not sure what could be going wrong, but this sounds like it could be uninitialized memory being interpreted as an option somewhere. Also make sure to try validation mode. Do you have a reproducer you can share for the module creation error?

We will provide an example of payload semantics in the next OptiX SDK version. Apologies that it’s not already there. Here (attached) is a preview: optixPathTracer.zip (13.6 KB)

For the ray-triangle intersection question, perhaps the issue is that your t value is actually zero? In this case, I believe the ray will be declared a miss, in order to prevent situations where rays can fail to make forward progress (imagine you had multiple triangle hits at t=0).

I don’t think there’s any built-in way to measure triangle test throughput specifically. You can capture the rays per second pretty easily by counting your optixTrace calls. Typically when I do something like this, I will use specialization to create two different kernels, one with ray counting and one without. Then run one kernel to count rays, and the other to measure accurate timing. You could use an any-hit program in a similar way to estimate the lower bound of the number of triangle tests that are invoked.


David.

Hi David,

Thank you so much for the prompt response!

  1. Payload semantics
    I am not entirely sure if I can share the code that I am working on (need to confirm with my advisor). But the example you shared is truly helpful!

Correct me if I am wrong, it looks to me that the number of payload types should be equal to the number of ray types. There are two types of optixTrace calls in your example (traceOcclusion and traceRadiance). Therefore two payload types are specified for them respectively.

I modified my code accordingly and the program is now working. However, adding payload semantics does not improve the performance in my case.

  1. For the ray-triangle intersection question, perhaps the issue is that your t value is actually zero?

Yes. In order to not miss any triangle, the tmin value I specified in the optixTrace call is 0.0f.

In this case, I believe the ray will be declared a miss, in order to prevent situations where rays can fail to make forward progress (imagine you had multiple triangle hits at t=0).

Do you mean that even with tmin set to 0.0f, small positive hitT (reported by optixGetRayTmax()) is sometimes ignored to prevent such situations from happening? If this is the case, is it possible to disable this feature? I prefer the program not to miss any triangle with a positive hitT (for example, 0.000001f).

  1. number of triangle test
    I think the any-hit approach should work. I will try that.

Thanks again,
Shijie

Yay! Glad it’s working. Bummer there’s no perf improvement. I guess your payload usage doesn’t yet have places where it can overlap and save registers. The notes on the optixPathTracer preview mentioned there was a measurable boost in speed, but of course it will depend on the GPU.

Do you mean that even with tmin set to 0.0f, small positive hitT (reported by optixGetRayTmax ()) is sometimes ignored to prevent such situations from happening?

Sorry, by “t value” I was referring to the hitT. I mean that if your hitT evaluates to zero, then you can get a miss, even if your tmin value is zero. This is the design of the triangle intersector, so I’m guessing maybe that your hitT is so small that it rounds to zero? Anything smaller than about 1.4e-45 will round to zero, since that’s smaller than the smallest representable 32 bit float. If your expected hitT is definitely larger than this, we should dive deeper. You could add a double precision custom triangle intersector to verify the expected hitT value.


David.

I have been using a RTX 2080 Super GPU. I will have more experiments on a RTX 3090 once it is ready.

In the miss cases that I mentioned earlier, some of the ray information are printed as follows:

miss detected, ray origin:[28.134830 28.517284 0.000002], ray direction:[0.827484 0.353074 -0.436580], hitT: 340282346638528859811704183484516925440.000000

The triangle meshes include a 60 x 60 x 60 cube spanning from [0.0, 0.0, 0.0] to [60.0, 60.0, 60.0] (see the figure below). As you can see, the ray origin is still inside the cube, the ray is propagating towards -z direction and is expected to hit the one of the triangle at the bottom of the cube. However, the built-in intersection program reports a hitT value of FLT_MAX (perhaps this is a numerical error?). As a result, a miss is declared. The tmin value is set to 0.0f in this case.

In case you want to reproduce this: GitHub - ShijieYan/mmc at reproduce_miss_case. This has been tested on a Ubuntu 20.04 system (gcc 9.4) with Cuda 11.6 and 515.65.01 NVIDIA driver. To build the program, go to the /src folder and “make OPTIXHOME=OptiX_7.5_DIR”, where the OptiX_7.5_DIR is where the optix-7.5 is installed. To reproduce the miss cases, go to the /examples/optix folder and run “runtest.sh”.

Is the cube mesh using an index buffer and shared vertices? You should be able to have a “water tight” cube mesh as long as the vertices are actually shared. If the vertices are not shared, then rays will very occasionally be able to sneak through the edges or vertices due to numeric precision if the ray strikes an edge or vertex.


David.

I mocked up the ray and cube’s bottom face in Blender and it looks to me like the ray does not cross the edge close enough to expect any numeric problems, so maybe something else is going wrong. Thanks for the reproducer, I will try it soon and investigate.

One thing you could also try is mocking this up in the optixTriangle SDK sample - modify the raygen program and vertices in that sample to mimic the ray & triangle here, and see if you get the same behavior. (I did this yesterday before replying to verify that I was getting valid hitT values in the denormal range of floats.) It might be worth capturing the exact hexadecimal value of your ray origin & direction & triangle vertices, in case the printed floats rounded any bits.


David.

Thanks for the feedback on this!
I follow the steps used in the firstTriangleMesh example to feed the mesh data to build the acceleration structure. For example, in the benchmark that I just uploaded (where the mesh is simply a 60^3 cube), the mesh is represented by two buffers:

  1. node buffer (each row is a node):
    Selection_254
  2. vertex buffer (each row represents three vertices of a triangle)
    Selection_255
    In this benchmark, miss cases can still be observed, for example:
    miss detected, ray origin:[30.815659 29.627020 0.000001], ray direction:[0.712532 0.147897 -0.685875], hitT: 340282346638528859811704183484516925440.000000
    The corresponding hex value is:
    miss detected, ray origin:[41f68678 41ed0423 359c39fc], ray direction:[3f366880 3e177236 bf2f9583], hitT: 340282346638528859811704183484516925440.000000
    I create a plot to show the mesh (black solid lines) and the ray (red solid line) in MATLAB:

    It looks to me that the ray is not hitting any edge or vertex. To reproduce, you can run
    examples/optix_box/runtest.sh
    I will try the optixTriangle SDK sample.

I am able to reproduce these miss cases in the optixTriangle SDK sample. Here is my modification to the code:
modification.diff (2.3 KB)
Basically, I edited cartesian coordinates of the three vertices using the values captured from the figure below.


The ray origin and direction are updated using hex values reported in my program mentioned earlier.
Here is the output printed by the miss shader:
miss detected, ray origin:[30.815659 29.627020 0.000001], ray direction:[0.712532 0.147897 -0.685875], hitT: 10000000272564224.000000

1 Like

Yep, you’re definitely hitting the limits of single precision floats here.

Thanks for the optixTriangle reproducer!

So probably the way to think about this is that the scale of your triangle is around 85, because that’s the longest edge length. And your z-distance to the triangle is slightly over 1e-6. Single precision floats have about 7-8 decimal digits of accuracy, and the ratio of your triangle edge length to the expected t value is just less than ~1e8 (85/1e-6), which requires ~8 decimal digits of precision. This means that the intersector is finding the rounded hitT value to be <= 0 sometimes, and then returning a miss.

BTW, the hitT value reported doesn’t actually tell you what the intersector thinks it should be. In this case, the expected hitT value is:

0.00000116397814054 / 0.685875117779 = .00000169707007933046

The problem is just that this dips below the limits of single precision and sometimes gets rounded to zero or negative, resulting in a miss.

I expanded on your diff a little bit in order to visualize the precision:

static __forceinline__ __device__ void computeRay( uint3 idx, uint3 dim, float3& origin, float3& direction )
{
    origin    = make_float3(__uint_as_float(0x41f68678) + float(int(idx.x) - 400)*0.1f, __uint_as_float(0x41ed0423) + float(int(idx.y) - 400)*0.1f, __uint_as_float(0x359c39fc));
    direction = make_float3(__uint_as_float(0x3f366880), __uint_as_float(0x3e177236), __uint_as_float(0xbf2f9583));
}

This puts your original ray at the image coordinate (400, 400), but allows us to see the whole triangle, and discover there are many misses across the triangle. Take special note that the lower left corner is much better than upper right. This is because the lower left corner is near the world space origin, and precision is higher there. Floating point precision gets worse further from the origin, so the upper right is worse than lower right.

If we add a small offset to the ray origin to give ourselves 1 extra decimal digit of precision (about 3 more bits), we can eliminate the misses:

static __forceinline__ __device__ void computeRay( uint3 idx, uint3 dim, float3& origin, float3& direction )
{
    origin    = make_float3(__uint_as_float(0x41f68678) + float(int(idx.x) - 400)*0.1f, __uint_as_float(0x41ed0423) + float(int(idx.y) - 400)*0.1f, __uint_as_float(0x359c39fc));
    origin.z += 0.00001f;
    direction = make_float3(__uint_as_float(0x3f366880), __uint_as_float(0x3e177236), __uint_as_float(0xbf2f9583));
}

Also be aware that ray direction affects precision, the linear solver will of course need to use the ray direction somewhere, likely in the form of a multiply, which may eat away at the number of bits of precision you have. This is all inherent in single precision floats, and you’ll get the same behavior if you use a software intersector under the same design constraints. If I ensure that multiplying by the ray direction doesn’t affect precision, we can see better results than the diagonal ray. This doesn’t help you, I just wanted to point out there are multiple sources of precision loss.

static __forceinline__ __device__ void computeRay( uint3 idx, uint3 dim, float3& origin, float3& direction )
{
    origin    = make_float3(__uint_as_float(0x41f68678) + float(int(idx.x) - 400)*0.1f, __uint_as_float(0x41ed0423) + float(int(idx.y) - 400)*0.1f, __uint_as_float(0x359c39fc));
    direction = make_float3(0,0,-1);
}

I hope that helps. So yeah rays that get very close are subject to numeric precision noise. Would adding a safety margin distance away from the cube walls be a viable workaround for your simulation?


David.

1 Like

Your explanation is very clear. The visualization under three different cases are truly helpful for me to better understand the root cause of this issue. I believe I have found a workaround for our application (we have some extra ray payload data that can help us identify these cases so that we can handle them specifically). Thank you so much for your help!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.