Curves performance in OptiX 7.1

Why curves IS programs are built-in and not provided as an SDK sample/header like curve shading utils in curve.h? Is there some optimization not available from the SDK user level that justifies the complication of having curves in addition to triangles and custom primitives?

I’m asking because I compared performance of the SDK curves to cubic Bezier curves from Phantom Ray-Hair Intersector that I ported to OptiX. Piecewise-linear strands are comparable/slightly faster than my Bezier code, but quadratic and cubic strands are slower than Bezier (on non-RTX 1060, not tried on RTX card yet).

I simply used curve.h for calculating normals - maybe I should look for improvements there? Now the ptx I get for SDK curves is x10 bigger than ptx for Bezier curves (even taking into account Bezier IS and CH programs together).

Thanks!

Hi Robert,

This is a good question and there are several reasons why we decided to add curves to the OptiX API.

The primary reason, like with triangles, is for user convenience. Having curves available means you don’t have to implement your own curve intersector, which as you know takes valuable engineering time and includes making various trade-offs. We are of course looking forward to the possibility of future performance gains and ways that the OptiX team might be uniquely positioned to leverage the GPU and accelerate our curves.

It is not currently expected nor guaranteed that OptiX has the fastest curve intersection, however we are actively improving the speed and quality of our intersectors. The OptiX cubic intersector you’ve compared against is not currently the Phantom intersector, because, as you are probably aware, Phantom can be tricky to make behave correctly on some more extreme curve shapes, and this is an open research problem, so we had some concerns about launching OptiX curves using Phantom and whether that would work for everyone. That may change in the future, and if/when it does there may be improvements in OptiX that were not published in the paper. You can expect that we will continue to update the OptiX intersectors to include both performance and precision improvements.

It sounds like the version you have is currently pretty fast. Are you controlling for and matching the size of bounding boxes when making these comparisons? (I assume you are, just checking, since the Phantom paper advocates splitting for performance…) We have a wide variety of use-cases that we need to support for third-party renderers, so we can’t make all of the performance optimizations we’d like, but if you have made decisions in your implementation that you think we should consider, we would be more than happy to evaluate them for inclusion.

For the normal calculations in curve.h, we have included this code as an example for convenience but it is by no means the only way to compute normals, nor will it work for everyone. That code has some basic adjustments for curves with varying radius which you may not need at all. If you don’t need to handle varying radius, you can omit some of the math in there. If you are flat-shading your curves, you may not even need a surface normal at all, or you may be able to use the ray direction and curve direction to compute your shading normal.

As for the PTX size, we can explore this more if you’d like. Because we have multiple intersectors in OptiX, I guess the size is expected to be different by at least several times. But without knowing exactly what you’ve done I can’t speak to the exact reasons. Is the PTX size of OptiX curves causing a problem for you, is this something you would like us to investigate?


David.

By the way, I forgot to mention - if you haven’t tried it yet, you can get significantly better curve trace performance with the OptiX built-in curves if you use the flag OPTIX_BUILD_FLAG_PREFER_FAST_TRACE, at the expense of significantly increased memory usage (~2x). This flag currently increases the OptiX splitting factor.


David.

Hi David,

Thank you for all the details! Right, writing a curve intersector is a bit more than IS for a sphere or a triangle. The topic is interesting for itself to me, so I have done some more tests. In the end, all the speed differences are not that extreme, on the level of 20-30%, plus there are other factors like artifacts, so the conclusion for my case is probably to keep both, bezier and b-splines options, and let users choose.

My bezier code also has the limitation of intersecting only with the front face. It makes writing transparency and sub-surface effects so complicated that I just gave up for now. Having back face supported would be so great… :)

Usually, I need to support long strands with multiple control points. If the curvature is too high then the Phantom intersector for the Bezier curve produces artifacts, like indicated in the first image. Splitting helps, and also increases the speed (second image).

400 points
800 points

B-splines from OptiX do not show artifacts for the same control points, in both cases, but curve object is slightly different (smaller) and the fps comparison is not exactly 1:1.

Previously I tried with the OPTIX_BUILD_FLAG_PREFER_FAST_TRACE flag, now I also checked what happens with the OPTIX_BUILD_FLAG_PREFER_FAST_BUILD flag - interesting, fast build can also be a faster trace… again, differences are small but persistent. I tried on a curve with 400 points (too few for the Phantom intersector) and with 800 points. Below are images with linear segments to show how points are distributed.

400 points
800 points

And the fps are below, note that OPTIX_BUILD_FLAG_PREFER_FAST_BUILD results with faster trace in case of dense points:

n=400
         fast trace    fast build
lin:      1.30            1.25
quad:     0.80            0.75
cube:     0.66            0.63

bezier:   0.8


n=800 (dense enough to avoid artifacts with Bezier)
         fast trace    fast build
lin:      1.19            1.32
quad:     0.70            0.80
cube:     0.59            0.67

bezier:   0.92

As I mentioned - approximation/interpolation, artifacts or clean render, slightly faster/slower - to me it is great to have both at hand, Phantom Beziers and b-splines. The bigger issue are difficulties with the back face intersection… (maybe that is not so common problem if curves are used for tiny hair-like objects in most applications).

Large ptx size is not problematic for me. I gues it is a result of inlining code from curve.h in several CH programs I have. I’ll try reducing the math to minimum needed to get normals.

In case I can share my IS program for Beziers, but I simply followed the Phantom paper. Code producing images above is a simple python script, I can share that as well if you’d like to repeat the experiment.

Thanks again for all the explanations!

This is cool, thanks for the images. That gives me a better sense of what you’re testing and seeing.

A few extra notes:

B-splines from OptiX do not show artifacts for the same control points, in both cases, but curve object is slightly different (smaller) and the fps comparison is not exactly 1:1.

In case it helps make tests that are exactly 1:1, you can convert uniform cubic B-splines to Bezier with a quick matrix multiply. (and vice-versa with the inverse, of course).

image

(And divide by 6)

The bigger issue are difficulties with the back face intersection… (maybe that is not so common problem if curves are used for tiny hair-like objects in most applications).

You’ve nailed it, OptiX curves are currently primarily designed with very thin hair / fur / streamlines in mind, like a few pixels or less. We are thinking about how we can address back-facing hits in a future version.

If the curvature is too high then the Phantom intersector for the Bezier curve produces artifacts, like indicated in the first image. Splitting helps, and also increases the speed (second image).

Yes, exactly. This is the reason our intersector is currently different from Phantom. Some people have large enough datasets that splitting isn’t an option. Our goal was to avoid the artifacts without forcing users to increase memory usage.

fast build can also be a faster trace…

This is very interesting! I believe what this means is that there are BVH overlap problems that are bad enough that splitting increases the average overlap rather than reducing it. Your test case is a little devious since it has many curves intersecting at right angles. This will generally cause the BVH build to be sub-optimal. So it might be worth including some more tests with a range of different data sets from nicely coherent to nasty and random… here are some examples of my programmer art that each give me some different BVH performance results:

image


David.

Thanks! That’s good to know!

OK, now I understand, that makes sense.

I’ll play with tests and also with the code calculating normals, and will be back if something interesting appears.

Thanks!