Time-Varying Rendering with Optix

Dragonseel · May 9, 2016, 11:52am

Hello,
I use Optix to render VTK models via ray-casting.
I have a series of separate models/meshes that are part of a time-series.
Every second I want to display the next model in the series.

Right now I create a optix::Group and optix::Acceleration for each model separate and each time the currently displayed model changes I update the hooks in the context.
That means: context_[“top_object”]->set(mesh_groups[timestep]);

My problem now is the delay this creates. Optix seems to do a ton of work in the background creating a delay of nearly half a second.
I want to minimize this delay but have no real idea how to.

Is there a way to load and set all necessary object on the graphics card and tell optix to just ignore all parts that are not the currently displayed mesh? Some way without having optix to do some heavy background work each time the displayed mesh changes?

droettger · May 9, 2016, 12:35pm

Right, changing the scene structure or geometry data requires a rebuild of the acceleration structure (AS) and probably the PTX kernel as well.
The time it takes to update the AS depends on the AS builder you used. If you used Bvh or Sbvh, try Trbvh.

“Is there a way to load and set all necessary object on the graphics card and tell optix to just ignore all parts that are not the currently displayed mesh?”

Yes. If you’re able to load all meshes at once without blowing out the GPU memory, you can do that and put them under a Selector node which behaves like a Group node, but holds a visit program with which you control which children should be traversed by calling rtIntersectChild(index).

Search the OptiX Programming Guide chapter Selector and the OptiX API Reference document for all calls related to rtSelector*().

Means this Selector can be programmed to be a DIP-switch (multiple of many children) or a radio button behaviour (one of many children).
Latter is what you need for things like a flipbook animation (that’s how we called it in SceniX) or like in car configurators to pick different options (e.g. rims).

The implementation of how you want to pick which children to traverse and which not is completely your choice. It’s just another program domain you implement a small CUDA program for.

Advanced sidenote: The switching has per-ray granularity which allows to use this for some very interesting things, since you can actually pick different children to traverse depending on the per ray payload, intersection distance, etc. But mind that the ray is in object space in the visit program domain in case you use it to implement some level of detail with this.

In your case it would just need a single variable at the Selector node containing the zero-based child index to traverse. See example code below.

The switching is basically instantaneous then.

#include <optix.h>

rtDeclareVariable( unsigned int, flipbookIndex, , );

RT_PROGRAM void visit_flipbook()
{
  rtIntersectChild( flipbookIndex );
}

Dragonseel · May 9, 2016, 1:27pm

Hello,
that actually seems to be what I want.
I have build a version of this and the switching does indeed no longer add a delay.
But now somehow the closest_hit program does not get called nearly as often as needed.

Program selection_program = context_->createProgramFromPTXFile(path_to_ptx, "animation_selector");
selection_program["flipbookIndex"]->setUint(0u);
selector->setVisitProgram(selection_program);

selector->addChild(mesh_group_);
	
selector->validate();

context_["top_object"]->set(selector);
context_["top_shadower"]->set(selector);

The snippet shows my creation of the selector node and adding a single child to it.
If I directly set top_object and shadower to the mesh_group_ everything works fine.

My visit/selection program is exactly as you have posted.

But now instead seeing the full mesh I only see a small number of dark pixels that enable me to roughly guess the object shape is correct, but 90% of where the mesh should be shows just the background color.

I tried rtPrintf in the intersection program and with the selector only maybe a hundred prints are made, and with directly the mesh_group many thousands are printed.

Is there a obvious error or some step I am missing?

I have not altered neither ray generation program nor the intersection program to work with the selector node.

EDIT:
I have narrowed the problem down:
In my Phong-shader I cast shadow rays and they seem to be the problem.

rtDeclareVariable(rtObject,			top_object, , );
rtDeclareVariable(rtObject,			top_shadower, , );

...

if( NdotL > 0.0f && light.casts_shadow){
   PerRayData_shadow shadow_prd;
   shadow_prd.attenuation = make_float3(1.0f);
   optix::Ray shadow_ray = optix::make_Ray(hit_point, to_light, shadow_ray_type, scene_epsilon, light_distance);
   rtTrace(top_shadower, shadow_ray, shadow_prd);
   light_attenuation = shadow_prd.attenuation;
}

This is the code concerning the shadows. I assume it is that the shadow rays do not properly terminate and therefore the result of the phong shading is somehow undefined. But that’s just a guess.

droettger · May 9, 2016, 1:58pm

Sounds like some bug.

The variable scopes checked by the visit program are the program itself and the selector node.
Please try to put the variable flipbookIndex at the Selector node, which would be required when having multiple of them sharing the same visit program anyway.

Other than that, please always list the following system configuration details when reporting problems:
OS version, installed GPUs, display driver version, CUDA toolkit version used to compile the PTX code.

Sometimes it’s just a matter of upgrading the display driver.
I would recommend OptiX 3.9.0 and CUDA Toolkit 7.5.

(Optimization note: There is no need for a top_shadower object variable if it’s always the same as the top_object. That’s something originated from the OptiX SDK examples which show that and then spread to all examples because they shared common programs.)

BTW, here are some additional threads on on Selectors:
[url]https://devtalk.nvidia.com/default/topic/815975/?comment=4478105[/url]
[url]https://devtalk.nvidia.com/default/topic/669183/?comment=4080792[/url]

Dragonseel · May 9, 2016, 2:37pm

My system informations are as follows:

Windows 10 Pro, 64-bit
GeForce GTX 560
Driver version: 365.10 (the newest updated just now)
CUDA Toolkit 7.5
OptiX version: 3.9.0

I have changed the scope of the flipbooxIndex, removed the dedicated top_shadower and updated the drivers.
Nothing changed in the result.

Also I made the ray generation program use the mesh group directly and only the shadow computation use the selector node. Then I made a rtPrintf that told my what the visit program selected.
Just estimating by the amount of prints that are done (and all with the correct index), there are enough shadow rays generated to be correct.

In the scene there is only a single tube, so there should be no hit of any shadow ray, and putting a rtPrintf in the any_hit_shadow program does indeed not print anything.

So the problem could only be within the actual acceleration structure of the first child of the selector, (which works without the selector node!) or that the “does not hit anything” termination of rays goes wrong when using a selector node.

I then printed out the variables that are used to make the shadow ray to see if there are any NaNs or something stupid, but the values are looking fine.

Printing something directly after the shadow rtTrace call prints something, but far to few prints.

So the complete rays get somehow “lost” including the closest_hit program that casts the shadow rays.
And some rays are terminated normally and have a normal light attenuation of 1.0 in all components in the payload.
By using an if-clause I verified that there is no other case.

The remaining question is: Why does the shadow trace with a Selector node not work properly? I think I eliminated most of the error sources.

droettger · May 9, 2016, 3:00pm

Ok, that would require a reproducer to see if this can be reproduced in-house.

Please have a look at his thread which explains in the last post how to do an OptiX API Capture trace (OAC). [url]https://devtalk.nvidia.com/default/topic/803116/?comment=4436953[/url]

Use a minimal reproducer to trace that because everything is stored to disk and these traces can get huge.
Please remove all debugging rtPrintf code before tracing.
The trace.oac is a text file with all OptiX C API calls which sometimes helps to identify setup errors.

We would need the whole reproducing oac<running_number> folder as archive.
We can shortly discuss how to transfer that in a private message.