How could I generate flow-vectors for the denoiser from curve-primitive moves?

in my OptiX-7.4-based path-tracer app (visual 3D rendering output) I successfully generate 2D flow-vectors from triangle-based objects using current and previous transform matrices (and current/previous vertex buffers).
How could I do this for curve primitives?

Thank you!

Hey @m001 this is a great question.

There are several different possibilities, but let me caveat I haven’t tried these yet.

As long as the curves are thin (perhaps less than a pixel wide), it may be sufficient to use the hit attribute curve parameter to track the center point of the curve nearest to the hit point. So, you can use optixGetCurveParameter() to query the parameter for a given hit point, and then evaluate the curve at this parameter for time t-1. I’m not sure, but it might be best to evaluate the curve centerpoint at this parameter for both frames, and use the difference of those as your motion vector. (So what I mean is to not directly involve the hit point in the computation of the flow vector.) If the curve instance is moving, you can still access any instance transform using OptiX, the same way you would for triangles. Unless your curves are very large on screen, I would recommend trying this method first because it’s the easiest and should give decent quality I imagine.

If your curves are wider than a pixel or two on-screen, then using the curve parameter alone might not be accurate enough. In this case, if I were implementing this, I would compute a “moving frame” for the curve, which gives you an orthonormal basis (a 3x3 matrix) at every point along the curve. This way you can use the curve parameter to find a basis relative to the curve center, and encode your hit point in the space of this basis. Once you have that, you can use the basis at the same parameter for the curve at t-1 to completely reconstruct the hit point for the previous frame. If any of this doesn’t make complete sense, I’m happy to explain in more detail.

If you haven’t computed a curve’s moving frame before, I recommend using the “double reflection” method from this paper:

The idea is to choose and assign a frame for the start of the curve, and then sample your curve along its length and use double reflection to produce a new basis frame for every sample. This gives you a set of vectors (normal, binormal, tangent) at sample points. You can precompute these sampled frames, and in a shader program you just need to lookup the nearest frame and propagate from the precomputed sample point’s frame to your hit point’s curve parameter. (Note you only need to precompute and store a “normal” vector in memory, the binormal can always be reconstructed on the fly using the normal & curve tangent.)

A third option is optical flow, which might be an acceptable (if low quality) fallback for any primitives that are hard to track in time. This is easy to implement, and doesn’t require reasoning about motion transforms, but is probably not the best accuracy, generally speaking.


thank you for this very detailed answer.
yes, the curves are very small on screen.
I think I will first start with the “easiest” optical flow. So I simply save a 2D screenpixel (integer-based) location for each curve primitive segment.
When then at least at the 2nd frame I then would simply lookup that segment and calculate the screenspace pixel difference to the current launch index XY pixel.
Hope I understood it right…

With optical flow, I was more suggesting to try an automated solution that takes two images as input and tries to infer the flow vectors.There is a sample in the OptiX SDK called optixOpticalFlow that demonstrates one way to do this. You can also search the internet for other examples - there are some open source alternatives.

Now that I think about it more, I’m not very confident that optical flow is a good solution for curve rendering with thin curves - the curves might making computing optical flow difficult, unless you do a little work to render curves images in a way that makes it easy to find correspondences (like a shader that renders an ID buffer of curve segments, for example).

Either way, I think you want to use float based coordinates, not integer based, even if you compute screen space locations.

The first method I mentioned, and the one I recommend, would be to compute the flow vector in a closest hit shader by querying the curve parameter, and then evaluating the curve position at time t (current frame) and t-1 (previous frame). Then see if the flow vector defined by the difference between those two points will work for you.


Hi David, thank you for your answer.
Ok, so I misunderstood you.

The launch index is a uint2 (X,Y pixel coord) and that one I store for each rendered curve primitive, then I calculate the difference between the current and the previous launch XY as signed integers, since difference between pixels are always integers; the flow vector then is stored as float2 of course;
Doing integer computation instead of converting 4 values to float and doing 2 float subtractions should be faster, shouldn’t it ?
And for screen pixels, which have as input already uint2 I don’t see any case where they would require any float-type.
I’m not calculating any hitpoints, I simply use the final screen pixel of a primitive, where the curve was rendered (ignoring race conditions during writing the XY for that primitive).

For the texcoord I use the curve parameter; I cannot simply calculate that for t-1, cause the curve input was build from a streamOut buffer, which itself was generated from a simulated animation; So its not the same curve at all, only its topology is ensured to remain identical.
I have not understood how the part what you described about the 3x3 matrix basis works, yes, please explain that.

Here an example output:
pixel [4054] coord: 480, 461 prev: 473, 469 flow:7.000000, -8.000000
pixel [4055] coord: 480, 459 prev: 476, 461 flow:4.000000, -2.000000
pixel [4058] coord: 488, 427 prev: 486, 427 flow:2.000000, 0.000000
pixel [5608] coord: 550, 419 prev: 550, 419 flow:0.000000, 0.000000
pixel [5609] coord: 557, 405 prev: 557, 405 flow:0.000000, 0.000000
pixel [5830] coord: 641, 104 prev: 636, 103 flow:5.000000, 1.000000

From the temporal denoised visual output the above described pixel-difference flow vectors seem to work, even when they are somehow applied to all pixels of a curve primitive (if its covering more than one pixel).
Visually the output seems not to have artefacts on the test colors I used for now.

I see, so this is different than what I was trying to describe, but it could work. I think the main assumption in the scheme you’re describing (to store a launch index per curve primitive) is that the curves will be very short on screen and they won’t change shape. Since you’re saving only a single launch index per primitive, even though a curve might appear in several launch indices, it will work well as long as a single curve primitive does not cover very many pixels, and all the control points of the curve segments move in a roughly rigid way.

I had a slightly different scheme in mind with my first suggestion. The assumption is that the curve might be 1 pixel or less wide, but might be many pixels long and might change shape during animation. In this case, I’m suggesting to put the curve data for your streamOut buffers for the current and previous frame both in memory. Then lookup the 4 control points needed for both frames, and use the curve parameter to evaluate the curve location for t and t-1. I’m guessing that it is possible from your simulation data, you would just need to evaluate the cubic curve equation. You are saying that all the control points are different position, but the topology of the curve data is the same, right? We have some example code for doing that in the OptiX SDK (see the function normalCubic in in the optixHair sample - along with the utility class CubicBSplineSegment in cuda/curve.h - this code evaluates the “surface normal” of the curve, but you could instead evaluate just the position for a given curve parameter.)

If your scheme is working well, then of course, there is no need to complicate things, you may have solved your own problem in a nicer way than I was suggesting. I would keep an eye out for how good the flow vectors are at the start, middle, and end of a curve, in case the curves are changing shape.

The solution where I talked about computing the moving frame of a curve is a bit more involved, and I’m guessing you might not need it. This solution would be useful if your curves are many pixels wide on screen, and maybe hundreds of pixels long. If you screen coverage of a single primitive is hundreds or thousands of pixels or more, then you might want super accurate flow vectors. This would also be important if your curves experience twisting animation, or if they have detailed surface textures. (Incidentally, a moving frame is how you might compute both u and v texcoords of a curve, so that you can give it a texture not just along the curve, but also radially, around its perimeter.) Moving frame - Wikipedia


so instead of loading the previous controlPoints through optixGetCubicBSplineVertexData,
I would access them from a “previous vertexbuffer” holding them for all curve primitives.

So when I would want to apply the flow vectors to the optixHair SDK-sample,
I would ensure the returned content of Hair::segments() (in Hair.h)
to have the same primitive index topology as it had in last frame.

And I simply would keep the vertexbuffer [content of Hair::m_points (in Hair.h)] already on the current frame as “previous-vertexbuffer” for processing in next frame.
Then on next frame; I could use the identical primitive index into the OptixBuildInputCurveArray::indexBuffer (which is identical on current and last frame) and which then indexes into the previous-vertexbuffer to obtain the control points. I even don’t need a object2world transform, since all the simulated control points are always in world-space in my app.
Now I understand, with that data then I can calculate the hitpoint using the method “position3(float u)” of the current curvetype as you described.

There is no function optixGetCubicBSplineIndexData or is it? Or is there a way to access that index buffer in current GAS? Cause if it is, then I would not need to keep it additionally in memory.

Thank you very much David!

Yes, you’ve got the right idea about accessing the data. Note you can still use optixGetCubicBSplineVertexData() for the current frame’s data (i.e. the data used to build your BVH), and only refer to your own vertex buffer for the previous frame’s data, if you want. The advantage of doing that is a memory savings, at the cost of having to access the data two different ways. By memory savings, I’m referring to whether you choose to delete your current frame’s vertex buffer after building the BVH, and then in that case you will need to use the OptiX vertex data function.

There is no function optixGetCubicBSpline Index Data or is it? Or is there a way to access that index buffer in current GAS? Cause if it is, then I would not need to keep it additionally in memory.

This is a good question. We don’t currently have such a function, so you’re right you’ll need to keep your index buffer around for any data that you want to access directly. I’m going to put this on our list of functions to consider adding to OptiX.

Here’s a dumb idea that might make sense in some scenarios, perhaps if you are rendering multiple frames in sequence with a single run of your render process. You could load the data for frames 1 & 2, and build the BVH for both frames. This would allow you to use optixGetCubicBSplineVertexData() for both frames to compute the flow vectors for frame 2. Then you can delete the data for frame 1, and load/build the data for frame 3 in order to render frame 3, reusing the existing data for frame 2. This probably isn’t a memory savings compared to storing your index buffer, but might make things simpler or allow you to amortize some costs. If you render frames in separate processes, then this probably isn’t a great idea.


Oh, as long as I’m throwing out dumb ideas that cost memory but maybe make your life easier –

Another thing you could do is to use the OptiX motion blur functionality to build a BVH that contains the data for both of your time steps. This way you can evaluate the curve at time t and t-1 using the same dataset, with the only difference being the “time” parameter to optixGetCubicBSplineVertexData(). Here you would pass the vertex buffer data for both frames into a single BVH build. This would alleviate the need to keep an index buffer, but same caveat as my previous suggestion, this is trading the index buffer for an additional BVH, so it’s probably not going to save memory. That said, you also have the option to load more than 2 frames of data into a single motion BVH, as long as you have memory for it.

This is probably neither fastest nor smallest footprint idea, just mentioning it as an alternative way to implement the rendering of flow vectors, and with slightly less indexing complexity.


That is great!

Yeah, that would be a good idea if there would be no accumulation frames. But I deallocate the index buffer and the previous vertex buffer already after the first accumulation frame, so if those buffers would be stored in the GAS I would need to rebuild that GAS again, which I think would cost more time, than simply deallocating the two buffers. And as you said, it would share the index buffer, but for the 2nd frame additional the very same amount of indices would be required. So no memory would be saved.

Its a great idea for the case, when I actually use motion blur on curves (yet not implemented), so thanks for that input!

Hey so the team has already discussed whether to provide access to the index data, and it turns out that under the hood we currently do not preserve your index buffer and don’t have a way to easily recreate it. The reasons for this are so we can efficiently support adaptive curve sampling, which gives you modest improvements to both memory usage and intersection speed when using OptiX’s default render settings. We also want to keep the door open to even better curve data compression strategies, and those might make the problem of OptiX keeping track of the vertex index data harder. We don’t want to force OptiX users to pay for features they’re not using, so we don’t want to hold on to the index data unless it’s always necessary.

The good news is that the work you do to keep your index buffer around for lookups is not wasted, and also that keeping your index buffer is not doubling up the memory usage of the index buffer like one might assume.

As a side note on adaptive sampling & memory usage – just in case this wasn’t clear already – if you are tight on memory you can use the FAST_BUILD flag to get lower memory usage (at a cost of slower render speeds). If you do this, be careful to make sure that your flags in both OptixBuiltinISOptions and OptixAccelBuildOptions match each other. The default settings are doing curve splitting, which increases the number of curve segments internally (so that the bounds are tighter). The size and perf difference of curve rendering between default settings and FAST_BUILD is currently and normally approximately 2x (i.e. 2x smaller, 2x slower). If you use the FAST_TRACE flag, it’s approximately 2x in the other direction (2x larger, 2x faster). This of course depends a lot on what your curve data looks like, your perf ratios can be different.


1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.