Pick ray in Optix7

Bird33 · September 25, 2019, 6:38pm

I’m trying to generate a pick ray in Optix7. The actually picking part seems to be working okay but I can’t figure out how to get the meshID and primitiveID back to my application. Here’s the program code.
[url]https://gist.github.com/Hurleyworks/a5798887b4e679fb8f0943615e8ed331[/url]

When generating a pick ray the error message is CUDA error on synchronize with error ‘an illegal memory access was encountered’

Lines 74 and 75 are causing the problem but I’m not sure why.
params.pickData[0]= payload.meshID;
params.pickData[1]= payload.primitiveID;

droettger · September 26, 2019, 8:23am

extern “C” {
constant PickParams params;
}

Change the code to not write into the memory you access via the OptixPipelineCompileOptions.pipelineLaunchParamsVariableName, here “params”. That lies in constant memory.
Instead put a CUdeviceptr there which points to a buffer with the two results and write to that.

struct PickParams
{
  // 8-byte aligned.
  OptixTraversableHandle sceneAccel;
  CUdeviceptr            pickBuffer; // CUDA device pointer with room for at least two unsigned int.

  // 4-byte aligned
  int    picking;
  float3 rayOrigin;
  float3 rayDir;
};

...

  unsigned int* pickData = reinterpret_cast<unsigned int*>(params.pickBuffer);

  pickData[0] = payload.meshID;
  pickData[1] = payload.primitiveID;
...

The OptixTraversableHandle is a 64-bit value and must lie on an 8-byte aligned address. Same for CUdeviceptr.
All other fields in that PickParams stucture require only a 4-byte aligned addresses.
I recommend to always order all device side structures by alignment requirements with the largest first.
If the structure is used in an array, I even manually pad its overall size to the largest alignment.

I would make the picked IDs unsigned int because both optixGetPrimitiveIndex() and optixGetInstanceId() return unsigned int values and the payload registers are also unsigned int references, which means there is no need for any integer conversions.
You can use ~0 as miss identifier which is safe in OptiX 7.0.0 because neither OPTIX_DEVICE_PROPERTY_LIMIT_MAX_PRIMITIVES_PER_GAS nor OPTIX_DEVICE_PROPERTY_LIMIT_MAX_INSTANCES_PER_IAS can exceed 4 Gig.

Bird33 · September 26, 2019, 5:27pm

Great, I’ve got it working now … except for using -0 as a miss identifier. That comes back as 0 on my machine which of course isn’t useful since it could be a valid instance or primitive index. Using -1 gives something close to the max unsigned integer so that’s easy enough to look for.

Thanks for the help!

droettger · September 27, 2019, 7:47am

No idea what your problem is with unsigned int ~0. Did you mistype “minus zero”? I wrote “bitwise not zero”.
There should be absolutely no change to the bits if every variable type in the code holding it is unsigned int.

Bird33 · September 27, 2019, 10:08am

Heh, yes I thought you typed minus zero … due to small monitor, old eyes and a tired brain :)

Bird33 · October 3, 2019, 1:39pm

I’m having some trouble integrating picking into my projects. My apps are physics based content creators that can have thousands of instances. Users can paint instances so I need to render the scene while generating a steady stream of pick rays when the user is painting.

I have tried 2 approaches

Create a separate launch context for picking and rendering. Each context has it’s own pipline, sbt, launch params and cuda stream. The first problem I had with this approach was that the RayTypeCounts for picking and rendering are different since picking only needs 1 ray. I fixed this by adding a “pad” pick ray and then adding a duplicate record when rebuilding the pick sbt hitgroup records when adding a new mesh. This is working fine now but the problem is I have had to duplicate potentially thousands of hit group records in the picking sbt and the rendering sbt. I tried using the same sbt.hitgroupRecordBase for both but that causes the app to freeze when picking.

The full working code for this is here [url]https://github.com/Hurleyworks/Optix7Sandbox[/url]

Merge the picking and rendering programs into 1 pipeline. I set the number of payload values to 6, 4 for rendering and 2 for picking and used payload slots 0 and 1 for picking and the last 4 for rendering. This almost works, the picking is correct but the render comes out wrong. The meshes are rendered correctly but all the colors are wrong.

Here’s a slightly simplied version of the cuda code

[url]https://gist.github.com/Hurleyworks/0997f7a4b0247988d7659c90dbc043dc[/url]

Any idea about how to make this work?

droettger · October 7, 2019, 4:11pm

I would make the picking a special case of the rendering.

All you’d need is to make the data required for picking available on the per ray payload, have a place where to output them, and distinguish the current launch mode.

That could be done with a flag and the picking coordinates in the global launch parameters. Handle both cases from a single ray generation program and combined global parameters instead of PickParams and WhittedParams in your case. (Alignment requirements in the latter is wasting space between members again.)
The closest hit on a primary ray would need to fill in the hit IDs when picking is enabled.

There should be no need for a separate ray type either then. That just bloats the number of hit record entries in the Shader Binding Table (SBT). Two ray types should be enough to handle all cases (rendering/picking and visibility check.)

The launch dimension would need to be decoupled from the rendering resolution to be able to calculate the single picking ray (launch dimension 1x1) on the full image resolution. (That’ll be required for multi-GPU rendering distribution anyway.)

You can also keep picking in a separate ray generation program.
For that you’d simply put both the picking and the Whitted ray generation program entry points into one OptiX pipeline and then create two SBTs where only the ray generation program record is different.

Means if you build all your other program domains to handle both rendering and picking, then you don’t have the issue with multiple SBT records for the other program domains. All other CUdeviceptr holding the SBT records are identical then. Dynamic scene changes happen only once.

To unify the payload count, you can simply allocate the per ray payload structure inside the ray generation program and split its pointer into the first two payload registers of the optixTrace call. You can get away with using only 2 registers in any arbitrarily complex rendering algorithm that way.

Other comments from glancing over the device code:

There is no need for __miss__pickMiss if you initialize the picking results for a miss event in pick().
Instead of uint32_t u0=0, u1=0; use uint32_t u0=~0, u1=~0; and the miss program is unnecessary.
The initialization of payload.meshID = ~0; payload.primitiveID = ~0; is unnecessary because they will be written unconditionally (unless there are exceptions during optixTrace()).
You can use OPTIX_RAY_FLAG_DISABLE_ANYHIT on the ray types without anyhit programs.
Note that OptixBufferView members contain CUdeviceptr which requires an 8-byte alignment. Same for any pointers.
Why are you setting OptixVisibilityMask( 1 )? The default mask to hit everything should be OptixVisibilityMask(255).
The ray generation program doesn’t really need optixGetSbtDataPointer(). It’s the entry point and can source the global parameters instead.
I’ve not seen any exception program in the *.cu files. It’s highly recommended to use them for debugging purposes.

Bird33 · October 8, 2019, 10:55am

Thanks for the reply! Unfortunately there’s a lot I don’t understand. :)

I’ll start with the comments I think I do understand.

* There is no need for __miss__pickMiss if you initialize the picking results for a miss event in pick().
Instead of uint32_t u0=0, u1=0; use uint32_t u0=~0, u1=~0; and the miss program is unnecessary.
The initialization of payload.meshID = ~0; payload.primitiveID = ~0; is unnecessary because they will be written unconditionally (unless there are exceptions during optixTrace()).

Okay, that makes sense. Thanks!

Why are you setting OptixVisibilityMask( 1 )? The default mask to hit everything should be OptixVisibilityMask(255).

I copied that from the whitted.cu used in the optixMeshView sample so maybe that needs to be changed.

I’ve not seen any exception program in the *.cu files. It’s highly recommended to use them for debugging purposes.

I couldn’t find any example of how to use make an exception program in either the OpitX Samples or in Ingo Wald’s tutorials

====================================
Here’s some things I don’t understand

The launch dimension would need to be decoupled from the rendering resolution to be able to calculate the single picking ray (launch dimension 1x1) on the full image resolution. (That’ll be required for multi-GPU rendering distribution anyway.)

I’m not sure how to do that, could you explain that a little more?

For that you’d simply put both the picking and the Whitted ray generation program entry points into one OptiX pipeline and then create two SBTs where only the ray generation program record is different

If I have 2 SBT’s then a I have to make 2 separate calls to optixLaunch(), one for picking and one for rendering, correct?

I think that’s enough for now. Thanks for the help

droettger · October 8, 2019, 12:41pm

Decoupling the launch dimension from the rendering resolution just means that you define the resolution of your output buffer inside the global system parameters and use that instead of the launch dimensions to calculate the primary rays.

Yes, if you have different ray generation programs you always need two launches.

1.) This example would use only one ray generation program and would still need two launches with different dimensions but has only one pipeline and SBT. (You can do with one as well, see below.)

struct SystemData
{
  // 8-byte alignment
  OptixTraversableHandle topObject;

  // The accumulated linear color space output buffer.
  // This is always sized to the resolution, not always matching the launch dimension.
  // Using a CUdeviceptr here to allow for different buffer formats without too many casts.
  CUdeviceptr outputBuffer;
  int2        resolution;   // The resolution of the full image in outputBuffer. Independent from the launch dimensions for some rendering strategies.
  
  CUdeviceptr pickingBuffer; // Buffer which will receive all information of the primary hit when pickingEnabled != 0.
  float2      pickingFragment;  // Pixel center coordinate on the full resolution image to shoot a primary picking ray for with the current projection.
  
  // 4 byte alignment 
  int   pickingEnabled; // Flag indicating that the pickingPixel should be used to calculate the primary ray. Launch at 1x1 size in that case.
};

extern "C" __constant__ SystemData sysData;

extern "C" __global__ void __raygen__eye_path()
{
  ...

  const uint2 theLaunchDim   = make_uint2(optixGetLaunchDimensions());
  const uint2 theLaunchIndex = make_uint2(optixGetLaunchIndex());

  // Decoupling the launch dimension from the screen resolution will allow for partial rendering algorithms.
  const float2 screen   = make_float2(sysData.resolution); // Note, not using theLaunchDim here!
  // E.g. assume theLaunchDim == sysData.resolution for rendering and theLaunchDim == (1,1) for picking.
  const float2 fragment = (sysData.pickingEnabled) ? sysData.pickingFragment : make_float2(theLaunchIndex) + 0.5f; // E.g. not progressive, just shooting rays at pixel centers.
  const float2 ndc      = (fragment / screen) * 2.0f - 1.0f; // Normalized device coordinates in range [-1, 1].
  // Assume sysData.camera contains the usual pinhole camera setup.
  float3 origin    = sysData.camera.P;
  float3 direction = normalize(sysData.camera.U * ndc.x + sysData.camera.V * ndc.y + sysData.camera.W);
  ...
  
  // Now you would initialize all your per ray data with the necessary information to distinguish rendering from picking.
  // The closesthit programs would be responsible to return the desired picking information on the resp. per ray data fields and end the path if picking is enabled.
  // Then you can write the results to sysData.pickingBuffer here.

2.) Since you said you need a continuous stream of picking results, it’s really simple to combine picking and rendering into one launch as well. Means one of the launch indices would return picking results on every render.

For that you would just need to have an uint2 pickingPixel; in the SystemData and then indicate on the per ray data (PRD) for the primary ray where theLaunchIndex == sysData.pickingPixel that this ray should also fill in the PRD field with the hit IDs.
That needs a depth field on the PRD to identify primary rays (just like above as well).

This is what I’m using.

#include "config.h"
#include <optix.h>
#include "system_data.h"

extern "C" __constant__ SystemData sysData;

extern "C" __global__ void __exception__all()
{
  //const uint3 theLaunchDim     = optixGetLaunchDimensions(); 
  const uint3 theLaunchIndex   = optixGetLaunchIndex();
  const int   theExceptionCode = optixGetExceptionCode();

  printf("Exception %d at (%u, %u)\n", theExceptionCode, theLaunchIndex.x, theLaunchIndex.y);

  // DAR FIXME This only works for render strategies where the launch dimension matches the outputBuffer resolution.
  //float4* buffer = reinterpret_cast<float4*>(sysData.outputBuffer);
  //const unsigned int index = theLaunchDim.x * theLaunchIndex.y + theLaunchIndex.x;
  //buffer[index] = make_float4(1000000.0f, 0.0f, 1000000.0f, 1.0f); // super magenta
}

Bird33 · October 8, 2019, 2:33pm

Thanks for the excellent help as usual!

One question before I try to make this work.

// The closesthit programs would be responsible to return the desired picking information on the resp. per ray data fields and end the path if picking is enabled.

The problem I see with ending the path in the CH program when picking is enabled is that I’ll never get an updated render while continuous picking and that won’t for my needs. Is there any way to get a render while continuously picking?

One piece of good news is that I’ve implemented geometry instancing in my project and am able to pick instances too … that’s something I could never get working in OptiX 6

droettger · October 8, 2019, 3:35pm

I described two different cases above. I added numbers to them now for clarity.
The second idea with the single launch would of course continue rendering, but would only need to save the hit IDs at the single selected primary ray once.
Means it needs the ray depth or path length to know that it’s a primary ray, and a flag to indicate that this is the picking ray, and the result IDs on the PRD must be initialized for the miss event before the optixTrace() of that primary ray.

Bird33 · October 11, 2019, 1:09am

I went with the 2 launch version and it’s working perfectly. Thanks again for all the help!

A couple of things if anybody else is following along

const float2 fragment = (sysData.pickingEnabled) ? make_float2(theLaunchIndex) + 0.5f : sysData.pickingFragment;

Should be changed to

const float2 fragment = (sysData.pickingEnabled) ? sysData.pickingFragment : make_float2(theLaunchIndex) + 0.5f ;

(Detlef: Thanks, is corrected above now.)

And when setting the pickingFragment in you app don’t forget that Optix screen space origin is bottom left, not top left … it took me awhile to figure out why my picking code wasn’t working. :

droettger · October 11, 2019, 8:03am

Oops, yes, that was dry coding. I’ll correct it.

Yes, the launch index (0, 0) of the pinhole camera is the bottom left corner, matching the orientation of OpenGL texture images used for the final display.
See slide 18 here: [url]http://on-demand.gputechconf.com/gtc/2018/presentation/s8518-an-introduction-to-optix.pdf[/url]
That’s solely under the developer’s control. OptiX doesn’t imply a screen coordinate system layout.

Pick ray in Optix7

==================================== Here’s some things I don’t understand

====================================
Here’s some things I don’t understand