Pick ray


I’ve just started exploring Optix and I’m trying modify the optixMeshViewer demo so that it generates a pick ray from the mouse pointer that can determine which object and which triangle is selected. Can sometime please give me the general idea about how to accomplish this?


Let’s view this from bottom to top.
If you want an object ID and a triangle primitive index you would need to have that information available inside a closest hit program where it’s written into fields inside the per-ray payload, returning it back to the ray generation program (“RGP” from now on) which writes the results into an output buffer.

The primitive ID is known inside the intersection program. You would need to add an attribute which returns that primitive ID to become available inside the closest hit program. (As a bonus, if you also return the barycentric coordinates you could even report where on that triangle you picked.)
The object ID would be a variable at the GeometryInstance node (read the OptiX Programming Manual chapter 4.1.5 about variable scoping).

Now there could be different mechanisms to implement this which would be more or less intrusive.
I will describe a method which makes use of the existing Material and closest hit programs, because that doesn’t need changes to the scene hierarchy.

  • Declare an object ID integer variable on each GeometryInstance and set it to a unique value.

  • Add the two fields object ID and primitive ID to your per-ray payload. Make sure to leave possible alignment requirements of the other fields intact!

  • You need to handle the case when not hitting anything. The simplest way would be to make the IDs integers and negative means no hit. Or depending on your preferences for zero-based or one-based indexing, use unsigned int and reserve 0xFFFFFFFF resp. 0 as no hit indicator.

  • Change the intersection program to return the primitive ID as attribute.

  • Change all closest hit programs to return the object ID variable and primitive ID attribute in the current per-ray payload fields for these.

  • Add variables to the RGP which indicate that you want to do picking (e.g. bool isPicking) and the mouse coordinates you want to pick.

  • Add an output buffer which will receive the object ID and primitive ID. A single int2 (or uint2) is sufficient.

  • When you want to do picking, set the bool isPicking variable to true and set the mouse coordinates and do a 1x1 size launch.
    Make sure your RGP is able to calculate the same projection as when rendering regularly, means calculations based on the launch dimension won’t work. You could use the screen output buffers dimensions instead. Or you could also simply calculate the picking ray direction on the host and send that instead of the mouse coordinates.

  • In the RGP initialize the object ID and primitive ID to your miss default to indicate a miss (-1 for integers, 0xFFFFFFFF or 0 for unsigned int). There is no need to change the miss programs.

  • Shoot the single picking ray.
    When the closest hit program is invoked it fills in the object ID and primitive ID. Anything else wouldn’t be needed and you could return after that when the isPicking variable is set. That’s defined at Context level so if you declare that inside the closest hit programs as well you can check its status.
    Finally the RGP writes the object ID and primitive ID from the per-ray payload into the dedicated output buffer.
    After the launch, map that picking buffer, read your two IDs, unmap the buffer.

That’s the least intrusive way. Other methods would need a separate RGP entry point, a dedicated per-ray payload and closest hit program, an additional material per GeometryInstance, another change inside the intersection program to report the intersection to the different material, etc. That would be a mess and only useful if picking is the major feature, like in a 3D texture painting application.

There are also different ways to input IDs. For example an additional buffer at the Geometry nodes which hold an ID per triangle. Or sending triangle indices as uint4 instead of uint3 and set a unique ID in the .w component of each triangle. (That’s what I’m using for robust self intersection avoidance.)

Note that you cannot distinguish GeometryInstances when they are reused under multiple transforms that way (“instanced sub-trees”) because the GeometryInstances would be identical including their same object ID.
If that is an issue, there is a way to solve that as well, but that’s a little more involved and requires scene graph changes.

Thanks for the detailed reply! It’s a bit over my head but I’m making progress. :)

I’m able to retrieve the primitiveID in the closestHit program by following your instructions but I’m stuck on how to obtain the objectID.

I create an objectID variable like this in host code. Is that correct?

GeometryInstance inst = context->createGeometryInstance();

I’m stuck on how to retrieve it in the closestHit program. I tried declaring it as a variable and assigning it the payload.objectID field but I get “mis-aligned address” errors when running.

“Make sure to leave possible alignment requirements of the other fields intact”
I’m not sure what that means. I added a padding int so that the sizeof the PerRayData_radiance struct was a power of 2 but I have a feeling that’s not what you meant. :)

struct PerRayData_radiance
  float3 result;
  float importance;
  int depth;
  int objectID;
  int primitiveID;
  int padding;

That should work, but there is a more elegant and robust way if you’re using the OptiX C++ wrappers. You can declare and set variables in one line and don’t need to care about the variable index.
In your case, like this:

GeometryInstance inst = context->createGeometryInstance();
// This can be called with any number of variables already on the object, the index isn't needed.

The operator on the wrapped objects will try to get the variable by name and if doesn’t exist, create it and the set*() function will set them to the proper type, which must match with the declarations inside the device code or validation will fail.
You can debug through those wrappers and see how the underlying OptiX C-API is used. It’s all in one header.

// Put this into a header! It must be the same everywhere. 
struct PerRayData_radiance
  float3 result;
  float  importance;
  int    depth;
  int    objectID;
  int    primitiveID;

All of the fields in your structure have a 4-byte alignment requirement, so that’s fine and no padding is needed if you do not use that in arrays (e.g. user defined output buffer formats.)

Check Table 3. Alignment Requirements here: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vector-types

Something like this would be bad.

struct Foo
  int    i;  //  4-byte alignment.
  float4 f4; // 16-byte alignment!

Inside the closest hit program you would then have the variable declarations at module scope outside the functions.

Put the following into all your closest hit programs which can be active in a scene during picking:

rtDeclareVariable(PerRayData_radiance, thePrd, rtPayload, );

// For the attribute, the user defined attribute semantic (here PRIMITIVE_ID) and variable type (here int)
// must match with the one inside the intersection program.
// The variable name itself doesn't matter for the matching.
rtDeclareVariable(int, primitiveID, attribute PRIMITIVE_ID, );

// This will be filled from the GeometryInstance variable scope.
rtDeclareVariable(int, objectID, , );

// And inside the closest hit program, this is the 

RT_PROGRAM void closesthit()
  thePrd.objectID    = objectID; 
  thePrd.primitiveID = primitiveID;
  // If you're running a recursive algorithm you would need make sure to return after the primary hit.
  // Or you would need to put this at the very end so that the first ray wins.

If you have made sure the PerRayData is the same everywhere and all(!) GeometryInstances have that objectID variable declared and initialized, that misaligned address could have other reasons.
What is your system setup?
OS version and bitness, installed GPU(s), NVIDA display driver version, OptiX version (major. minor.micro), CUDA toolkit version used to compile the PTX code.

“Put the following into all your closest hit programs which can be active in a scene during picking:”

Oops. That’s the step I was missing. My simple scene has a ground plane and a cube mesh. T was only modifying the cube material (phong.cu), since that’s all I wanted to pick. The ground plane(parallelogram.cu) is taken from the optixWhitted demo and has the checker.cu material applied.

Everything works as expected if my scene has either just the ground plane or just the cube. When debugging the closestHit programs in both the phong.cu and checker.cu, I’m getting the proper objectID and primitiveID.

But if I start with the ground plane and then add a cube, Optix throwns an exception

“OptiX Error: ‘Unknown error (Details: Function “_rtContextLaunch2D” caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (716): Misaligned address)’
OptiX Error: 'Unknown error (Details: Function “_rtVariableSet3f” caught exception: Assertion failed: “!m_launching”, file: C:\u\workspace\rel4.1-win64-cuda80-VS2015-build-Release\sw\wsapps\raytracing\rtsdk\rel4.1\src\Context\TableManager.cpp, line: 304)”

All the demo scenes and Advanced Samples run fine on my system:

Windows 10 64bit
Visual Studio 2015
OptiX 4.1.1
Cuda Tooolkit v8.0
Number of Devices = 1
Device 0: GeForce GTX 960M
driver version:
Compute Support: 5 0
Total Memory: 4294967296 bytes
Clock Rate: 1176000 kilohertz
Max. Threads per Block: 1024
SM Count: 5
Execution Timeout Enabled: 1
Max. HW Texture Count: 1048576
TCC driver enabled: 0
CUDA Device Ordinal: 0

Constructing a context…
Created with 1 device(s)
Supports 2147483647 simultaneous textures
Free memory:
Device 0: 3597788774 bytes

Thanks again for your help

In case it makes a difference, I’m adding the objects to the scene interactively (drag and drop anytime) and that part still works fine when there’s no ground plane.

Here’s an older version where I’m using the physics engine for picking

Ok, so if the two objects work individually, you’ve changed the intersection programs and added the objectID variable to both GeometryInstances correctly.

That mis-aligned address could also be a bug in OptiX or the CUDA driver.

You could try to update your display drivers first. Since 382.33 eight newer Win10-64 drivers have been released for the GTX 960M. http://www.nvidia.com/Download/Find.aspx?lang=en-us

If that isn’t helping and you’re sure everything else is sound, we would need a reproducer to investigate.
The bottom of this thread explains how you can generate an OptiX API Capture (OAC) trace which we can use without the need for a full reproducer application. https://devtalk.nvidia.com/default/topic/803116/?comment=4436953

Nice video! :-)

Updating the driver had no effect.

But I found a new clue. Since the ground is a 2D plane, I had set the acceleration structure type to “NoAccel”. If I change it to “Trbvh”, the crashing stops.

Got the whole thing wired up now and it’s working perfectly as long as I use an acceleration structure for the ground plane.

Thanks for the excellent support!

You’re welcome.

BTW, another way to handle that would be to simply render the object ID and primitive ID into a full viewport output buffer for a picking event.
As long as the scene doesn’t change, you could simply read the IDs under your current cursor position with no additional launch, e.g. for a tooltip at your mouse position when hovering over something.
Not so good if the scene is constantly animated.

"But I found a new clue. Since the ground is a 2D plane, I had set the acceleration structure type to “NoAccel”. If I change it to “Trbvh”, the crashing stops. "

Someone on our team read this, and fixed a crash the same way. However in my other use cases the NoAccel works correct. So I’m curious why this would fix the crash. To my knowledge Trbvh is a triangle based bvh which requires setting triangle info through rtAccelerationSetProperty. Does it default to regular bvh is properties are not set?

The same also yesterday applied to me. Bird33’s suggestion also helped me in one of the mis-alignment exception cases. The exception in that case is gone since the change. Others unfortunately still remain.

Trbvh does not require triangle data.
Yes, it’s just another acceleration structure which can handle custom primitives.
But it comes with an additional specialized faster acceleration structure builder for triangles which supports splitting of triangles and is only invoked when setting the acceleration properties accordingly.

Find some more information here: https://devtalk.nvidia.com/default/topic/1022634/?comment=5203884

What are the scene graph changes needed to accomplish this? Does it involve “flattening” the scene by pre-applying the transformation matrix to the GeometryInstance to create multiple GeometryInstances – those can be uniquely identified at that point? Or, is there some way to identify the Transform that the ray passes thru during tracing in the intersection or other program?

1.) Flattening the scene to individual GeometryInstance nodes while sharing the underlying geometry’s acceleration structure at the GeometryGroup above them would be one approach. But that is not feasible for some scenes where the amount of instances would explode.

2.) Another much more involved approach would be this:
Assuming you’re instancing the same GeometryInstance with different Transform nodes and also multi-level transformation hierarchies.
The things you have available in device code would be variables declared at the GeometryInstance and the concatenated transformation matrix you can query with rtGetTransform().

First, also store a unique ID in a variable at the GeometryInstance.

A picking ray would need to store the unique ID of the GeometryInstance and the 12 floats of the concatenated transformation matrix holding the affine transformation.
(Note that only the 12 entries of the affine transformation are valid! The fourth row is assumed to be (0,0,0,1) and is unused, esp. in the RTX execution strategy of OptiX 6.0.0, those aren’t even handled.

On host side you would need to track your fully articulated scene graph with transform hierarchy and geometry instances which allows to maintain a search structure.
For each unique GeometryInstance you would need to hold an array of concatenated matrices and the destination index of whatever you need to identify in your host side scene graph.

With the picked unique GeometryInstance ID, search the concatenated transformation matrix over this GeometryInstance which is nearest to the one stored in OptiX.
Since these are float comparisons and the host CPU uses a different precision than the GPU, that comparison cannot be for equality.

Now when the same GeometryInstance happens to be transformed to the same location, there would be multiple concatenated transforms leading to that result.
In that case picking the first search result will do. They are at the same location anyway.

3.) Finally, the next OptiX 6 update will add a device-side function which allows to query the child index of the bottom-most RTgroup node.

For a two-level hierarchy that would be the root group and uniquely identifies the Transform children doing the sub-tree instancing.
To distinguish additional group levels, you would need to assign some per bottom-most group identifier to all geometry instances beneath them.
That should cover many cases already, just not cross sharing of geometry instances among different groups.

Hi everybody!

I’m new to Optix, and trying to modify “OptixTutorial” example with picking ray.

“When you want to do picking, set the bool isPicking variable to true and set the mouse coordinates and do a 1x1 size launch.”

Do you mean separate 1x1 launch of the same context, or to modify existing launch to make it 1x1?

And in general, can application have multiple launches of the same context? If yes, how do multiple launches interoperate with each other, or interfere?

Thank you in advance!

You can implement a picking ray in very different ways.

For OptiX 6 and before:
You could implement this as special case in your standard rendering launch entry point. Means you would need to change the ray generation programs and closest hit programs to toggle between rendering and picking and calculate the primary rays accordingly and return the required rendering or picking information.

You could even handle picking always when rendering. You would only need a pixel position (reps. launch index) for which the primary ray would determine hit event information and write them to the picking buffer.

The point is that you need to be able to calculate the primary ray for the picking and if you launch only one cell with an 1x1 launch you need to provide he necessary information to do that calculation inside the ray generation program.
You could even calculate that on the host and put the picking ray origin and direction into context variables.
(Better put all context gobal variables into an input buffer. Updating buffers has better performance than changing variables between launches in Optix 6. Or just use OptiX 7 which enforces this.)

And in general, can application have multiple launches of the same context? If yes, how do multiple launches interoperate with each other, or interfere?

There cannot be multiple rtContextLaunch calls active at the same time in an OptiX context, if you meant that.
But you can have separate ray generation entry points, means different ray generation programs like one for rendering and one for picking. The first argument in the rtContextLaunch call allows to switch between them.

That picking is active could be communicated to the per ray type closest hit programs via some flag on the per ray payload, then you wouldn’t need a separate ray type and additional hit programs.

Or you could write all hit events for all launch indices into some full resolution picking buffer and simply read from that, which doesn’t need updates as long as the camera is not moving.

For OptiX 7:
All of the algorithmic and device side methods apply the same way.
The only difference is that the ray generation entry point is switched by using different shader binding tables.
See here: https://forums.developer.nvidia.com/t/multiple-raygen-functions-within-same-pipeline-in-optix-7/122305

I would recommend using OptiX 7 for new projects.

Thank you, for so in-depth reply, as always!


And in general, can application have multiple launches of the same context? If yes, how do multiple launches interoperate with each other, or interfere?

i meant this situation:

void glutDisplay()
context->launch(0, width, height);

//Color buffer
Buffer buffer = getOutputBuffer();

context->launch(0, 1, 1);

Is it allowed to make the launches of the same context one after another? (I’m not sure if this means that they all will be active at the same time or not)

That is perfectly fine and the launches run in that order.
Note that in OptiX 6 and before, the rtContextLaunch calls are synchronous, means blocking and not returning until they are done, unless you’re using the command lists in the post-processing API.
Just one more reason to switch to OptiX 7, because there launches are always asynchronous and could even run in parallel on different streams if the GPU load allows.