Ray-Generation with User Specified Rays

Hello,

I want to use the Optix accelerated ray tracing engine for my research project, I just need to calculate ray-triangle intersections and record whether any intersections occur. The Optix engine has the pre-constructed acceleration structures which should make this calculation very fast.

My question is: Does Optix let you efficiently calculate over an array of pre-defined rays? Based upon the examples, the ray generation program is designed for visualization, namely quasi-random ray direction generation for each pixel. In my project, I have a list of pre-defined rays which I want to calculate over.

Thanks for your help!

Yes. Check the optixRaycasting example in the SDK, which just shoots primary rays.

Thanks for that - I have a question about the geometry creation subroutine in that example:

context.setTriangles( model.num_triangles, model.tri_indices, model.num_vertices, model.positions,
      model.has_texcoords ? model.texcoords : NULL );

and

m_positions = m_context->createBuffer( RT_BUFFER_INPUT, RT_FORMAT_FLOAT3, num_vertices );
  memcpy( m_positions->map(), positions, num_vertices*3*sizeof(float) );
  m_positions->unmap();

  m_indices  = m_context->createBuffer( RT_BUFFER_INPUT, RT_FORMAT_INT3, num_triangles );
  memcpy( m_indices->map(), indices, num_triangles*3*sizeof(int32_t) );
  m_indices->unmap();

From the Mesh file, it looks like the model.positions is a 1D unwrapped array of vertex positions [x_1 y_1 z_1 x_2 y_2 z_2 …], is that correct? Also, for the vertices, does the array of the same form? [V0_1 V1_1 V2_1 V0_2 V1_2 V2_2 …] where V0,V1,V2 are the vertex indices.

Thanks again!

In using the optixRayCasting example, I am finding that the runtime is longer than I expected.

When looking at the code, I see that the context creates an acceleration setup in the code

geometry_group->setAcceleration( m_context->createAcceleration( "Trbvh" ) );

But in the geometry assignment, there isn’t a command to initialize the acceleration structure.

What command would be necessary to get the speed-up?

When asking about performance issues please provide absolute performance numbers and your expectations as well as the usual system configuration information:
OS version, installed GPU(s), display driver version, OptiX version (major.minor.micro), CUDA toolkit version used to generate the PTX code, host compiler version.

Acceleration structure objects must be assigned to all OptiX scene graph nodes with “Group” in the name, that means on all Group and GeometryGroup nodes.
That code line is creating and assigning an Acceleration structure using the “Trbvh” builder to a GeometryGroup. That plus a bounding box program per geometric primitive type is all you need.

The acceleration structures (AS) are built during the very first launch, which will also compile the kernel for that entry point. You can trigger these steps alone with a null-sized launch.
If you measured that first launch alone that included the initialization overhead.

When using triangle geometry it’s possible to select a more specialized acceleration structure Trbvh builder by setting buffer name and stride properties on the Acceleration object. That one is faster because it doesn’t need to call into your provided bounding box program per triangle and is also splitting geometry to produce a more efficient AS.
http://raytracing-docs.nvidia.com/optix/guide/index.html#host#acceleration-structure-properties
Example code here: https://github.com/nvpro-samples/optix_advanced_samples/blob/master/src/optixIntroduction/optixIntro_07/src/Application.cpp#L1613

Thanks for the response - I’m completely new to using computer visualization (I’m an aerospace engineer more used to Matlab than C++ APIs…) My goal is to use Optix just as a ray-tracer, so I’m not concerned about generating images, etc.

Information:
Windows 10 (64bit); Intel i7-6700;
GeForce GTX 970
Display Driver: 416.16
Optix: 5.1.1
CUDA Toolkit: v9.1
Compiler: Visual Studio 2015 (v140)

Situation: 10k Rays intersecting 100k triangles. Runtime (averaged over 20 cycles) is 0.3 seconds, when I go to the context to run the ray-tracing. (This excludes setting up the context and initializing geometry)

It likes like the optixRayTracing example doesn’t have a bounding box program per primative, I’ll look at the examples and see what I can do.

I meant to say that you need a bounding box and intersection program per geometric primitive type, here triangles.

Those are set inside the OptiXRaycastingContext.cpp source in the OptiXRaycastingContext() constructor with these calls below and defined inside the OptiXRaycastingContext.cu source.

m_geometry = m_context->createGeometry();
const char *ptx = sutil::getPtxString( SAMPLE_NAME, CUDA_SOURCE );
m_geometry->setIntersectionProgram( m_context->createProgramFromPTXString( ptx, "intersect" ) );
m_geometry->setBoundingBoxProgram( m_context->createProgramFromPTXString( ptx, "bounds" ) );

Note that 10,000 rays per launch is far too few to max out a modern GPU.

If you’re a beginner with OptiX there is my GTC 2018 OptiX introduction presentation which explains some basics about the minimal OptiX scene and program setups, but then quickly goes into path tracing for full global illumination.
But the first three examples are showing how to setup OptiX and a small scene from scratch and shoot primary rays and get some information back.
Links here: https://devtalk.nvidia.com/default/topic/998546/optix/optix-advanced-samples-on-github/
To judge expected performance you could try the optixIntro_04 example, which is a simple brute force path tracer. With default settings that shoots two rays per pixel and should easily accumulate hundreds of frames per second in the 512x512 window size.

The optixRaycasting example in the SDK basically shows in addition how to get arbitrary hit or miss information results and generate new rays by using CUDA natively into interoperability buffers which is already a little more involved.

Thanks for the links, I’ve looked at the GTC presentation but I’ll take another look.

To give some more explanation: I’m generating geometry and test rays (origins and directions) in a MATLAB program and then passing that information to C++ and from there calling the Optix API code.

With the optixRayCasting example, I’ve been successful in getting the information into the geometry and ray buffer and then getting the correct results out. But, as mentioned above, the runtime is really slow. I went through the code and found that most of the time is spent in context creation - which is understandable - and context execution - less so.

The very first launch will build the acceleration structures and compile the kernel code before starting to shoot the rays.
If you create the context and launch only once every time you ray-trace something from MATLAB, you will always incur that initialization overhead.

Ok, there is a way in the MATLAB API to keep the code running in between function calls, in which case I won’t be launching the kernel every time.

Using that, I can structure the code:

A. Create context, create first geometry group, and ray trace.
B. On subsequent calls, reload the geometry buffers and then retrace.

Using this approach, the acceleration structures should be the only thing which needs to be recompiled between context execution, correct? From what I’ve read about the Trbvh acceleration structure, it should take a trivial amount of time to be built (1-5ms), so the code should run at the desired speed.

Edit: changed idea to just changing the geometry buffer.

I changed the MATLAB API code and it is working perfectly now. Runtimes for the whole program are ~0.01seconds, which is plenty fast for my application.

Thanks for your help!