Render Quads


I would like to render a set of quadrilaterals with a boundary set (on device side). Is this possible with OptiX 7? If so can anyone provide links to how this can be done?

Thank you in advance for any help.

Hey @peterSteele123,

Sorry for the naive question, but can you tell us more about what a boundary set is and how you would want it to affect your ray tracing? I’m not very familiar with that term. Does this mean that some of the quads in your mesh(es) need to respond differently to rays than other quads?


Sure. Thank you for the response.

The boundary set is just what I called the array of 6 floats for min and max of X, Y, and Z coordinates defining a boundary on my Quadrilateral(s).

The CUDA kernel that I would like to implement the Quad boundaries is defined in same file with intersection shader (e.g. and has the following signature:
extern "C" __global__ void QuadBounds(float result[6])

I would also like to do some calculations within the QuadBounds kernel such as getting the current primitive index as something along the lines of

extern "C" __global__ void QuadBounds(float result[6]) {
   unsigned int qIdx = optixGetPrimitiveIndex();

Is this possible in OptiX 7?

I am able to get a triangular geometry with its Axis Aligned Bounding Box (aabb) working - given that the aabb is automatically defined for triangles, but getting this working for a Quadrilateral (4 vertex) element is different story. I am still kind of new to OptiX 7, so still learning.

Any help or links would be great.


Oh, I see, the boundary set is the AABBs for your quads.

Yes, this is possible in OptiX 7, and the only requirement is that you provide the AABBs and you provide an intersection program. In OptiX terminology, this is called a “custom primitive”. You can read more about defining custom primitives in OptiX here:

There are a few things to be aware of:

  • Using an intersection program for quads is going to be slower to render than using the built-in triangles as quads. The primary reason to use your own intersection program for quads is if you’re running out of memory, and you’re sure you wouldn’t run out of memory by using quads. The maximum possible difference is 2x memory usage, but in practice the difference is smaller, so there is only a narrow window (in my opinion) where it makes sense to use custom primitive quads instead of built-in triangles.

  • If you use built-in triangles, then (as you noted) you don’t need to provide either an intersection program nor the QuadBounds() kernel. Instead, you could generate 2 triangles for every quad, and when you want to convert from a triangle ID to a quad ID, you can shift-right the triangle ID by 1 bit (equivalent to an integer divide by two) - as long as you generated your quads by using consecutive triangles to form pairs. That means that every even numbered triangle is one half of a quad and the next (odd numbered) triangle is the other half of the same quad. This isn’t the only way to do it, I’m just outlining one possible & easy hack. The numbers that matter here are your triangle indices in your index buffer.

  • The SDK sample optixPathTracer is an open-source example of rendering quads in OptiX 7 using the built-in triangles. Take a look at the hard-coded vertices, and then notice how the material offsets in g_mat_indices come in pairs.

If you still want to define native quad primitives using a bounds program to generate AABBs and an intersection program to do ray-quad intersection:

  • The main thing to know is that your AABBs define the order of your custom primitives, whether they’re quads in a mesh or spheres or anything else. This means that in order to index a quad inside your QuadBounds() program, the most straightforward thing to do is to use the thread id as your quad index, and process only a single quad per thread. For this you don’t need a function or macro to give you the primitive index, you can use the standard CUDA expression to compute your thread id and use that as your primitive index without modification.

  • In your intersection program (or shading programs), the primitive index n that OptiX returns from optixGetPrimitiveIndex() is the index corresponding to the nth AABB that you passed in. For example, if optixGetPrimitiveIndex() returns 5, then that means your ray is testing (or shading) the primitive corresponding to bounds[5] where bounds is your array of AABBs, filled in via QuadBounds() and passed as type OptixBuildInputCustomPrimitiveArray in buildInput.aabbBuffers to optixAccelBuild().

  • I’ve had good luck separating my OptiX device programs and my CUDA kernel bounds programs into separate files. I have not had good luck putting them in the same file. This is because they typically need different compilation flags (and sometimes different compilers). This depends on your build system, just be aware you may need to learn a little about both CUDA and OptiX compilation and linking if you hit any snags.


1 Like

Thank you @dhart for the professional and very informative answer. Definitely a lot to take in.

In the interest of learning, I think I will first try to employ defining native quad primitives using a bounds program to generate AABBs and an intersection program to do ray-quad intersection. Can you illustrate how I would go about setting the extern "C" __global__ void QuadBounds(float result[6]) CUDA kernel to the AABB for a custom primitive (e.g. a Quad)? Something that shows how to set the aabbBuffers to the CUdeviceptr of the QuadBounds kernel ?

I apologize if this is something simple I am just overlooking.

Thank you again for all your help.

Can you illustrate how I would go about setting the extern "C" __global__ void QuadBounds(float result[6]) CUDA kernel to the AABB for a custom primitive (e.g. a Quad)?

Sure, the idea is simple: aabbBuffers is just an array with bounding boxes (AABBs). You create the array, then fill it with AABB info using your kernel called QuadBounds. So the direct answer to your question is that you pass the array pointer to your QuadBounds kernel, which will use the pointer to write the AABB data into the array (device buffer). Once your kernel is done and the buffer has AABB data, you pass the array to optixAccelBuild in order to build your acceleration structure.

So your code might look something like this:
(note this is completely untested pseudo-code)

// assuming vertices is a buffer of vertex data
// vertIds is a list of vertex indices - 4 vertices per prim (quad)
extern "C" __global__ void QuadBounds(AABB* aabbBuffer, float3* verts, uint4* vertIds) {
    int primId = blockIdx.x * blockDim.x + threadIdx.x; // canonical CUDA thread id == prim id
    const uint4& q = vertIds[primId]; // just making the next line easier to read

    // this func you wrote takes 4 verts for a quad, returns the quad's AABB
    aabbBuffer[primId] = compute_AABB_of_quad( verts[q.x], verts[q.y], verts[q.z], verts[q.w]); 

extern "C" __host__ void makeQuadBounds(int numQuads, AABB* aabbBuffer, float3* verts, uint4* vertIds) {
    QuadBounds<<<numQuads, 1>>>(aabbBuffer, verts, vertIds);


const int numQuads = ...;
CUdeviceptr aabbBuffer;
// allocate a device buffer for our AABBs
cudaMalloc(&aabbBuffer, numQuads * sizeof(AABB));
// verts and vertIds are also device buffers, now filled with data
makeQuadBounds(numQuads, aabbBuffer, verts, vertIds);

OptixBuildInput aabb_input = {};
aabb_input.customPrimitiveArray.aabbBuffers = &aabbBuffer; // this is a host array (pointer) of device pointers. Only 1 device pointer in this case
aabb_input.customPrimitiveArray.numPrimitives = numQuads;

... // fill in the rest, compute accel mem usage, alloc GAS output buffer.
... // for complete examples, see the optix 7 SDK samples (optixPathTracer, optixHair, etc.)

optixAccelBuild( ..., &aabb_input, ... );

... // render!

1 Like

Okay. That is pretty simple.

Thank you for the information, very helpful.

Thank you again @dhart for the help - very much appreciated.

One more question, if that is okay. I don’t see when using custom primitives the actual quad vertices and/or indices are being stored as they are when using triangles - e.g. does a corresponding call to vertices such as the following for triangles exist for custom primitives ?

OptixBuildInput tri_input = {};

tri_input.triangleArray.vertexBuffers = &d_vertices; // Device copy of vertices

Am I being overly stupid and not seeing the obvious?

Thank you again for help.

Custom primitive build inputs are only taking the AABB array you calculated in a device buffer which is enough to build the BVH acceleration structure over custom primitives.
That doesn’t know about “vertices” because that custom geometry can be anything, including completely procedural.

Means you would need to store the actual quad vertices and indices in some separate buffers (fully under your control) which can be sourced inside the intersection program to calculate the intersection per primitive index of the hit quad’s AABB and the quad’s surface attributes inside the any/closest hit programs.

Note that you can share the same closest hit programs for different geometric primitives by using the optixGetHitKind device function which allows distinguishing the hit primitive type and interpret the reported intersection attributes per primitive type.

1 Like

Thank you @droettger - good information to know. So, I could store the actual quad vertices as values in Shader Binding Table passed to intersection program - correct?

Also can you tell me when it is safe to clean up allocate DEVICE memory with regards to AABB? For example in the code

When can I safely call cuMemFreeon the DEVICE allocated aabbBuffer, verts, and vertIds?

Thank you again, I appreciate the help.

Yes, and yes.

So, I could store the actual quad vertices as values in Shader Binding Table passed to intersection program - correct?

That is, if you mean that you store the two device pointers to the vertices and indices data in the SBT hit records.
If your scene contains instances, that requires a specific SBT layout with an entry per instance.
My examples implement the former, but I would recommend the following instead to simplify the SBT:
If you prefer storing only the different material hit record groups inside the SBT which are most likely fewer than instances, then you would use the instance.sbtOffset field to pick the hit record and instance.instanceId field to index into another buffer with the pointers for the resp. vertices and indices device pointers per instance and maybe other per instance data.

Please read through this chapter inside the OptiX 7 Programming Guide
and find this paragraph:

“The acceleration structure constructed by optixAccelBuild does not reference any of the device buffers referenced in the build inputs. All relevant data is copied from these buffers into the acceleration output buffer, possibly in a different format.”

Here’s another thread where a similar question came up:

Means the aabbBuffer input buffer(s) can be deleted after the optixAcceleBuild call.
Note that this is an asynchronous call (all OptiX 7 API functions taking a stream argument are), which requires that you make sure that the free is happening after optixAccelBuild has finished accessing that input data.
With cudaFree() or cuMemFree() that is automatically synchronous, but if you use some own asynchronous memory allocator, make sure to synchronize the stream before reusing that memory block. (Example for that in my OptiX 7 nvlink_shared application:

For the other data, you need to keep that around as long as you use the traversable and accelerations structure for raytracing, if you source from the vertices and indices inside your OptiX device programs.

1 Like

Cool. Thank you for the answer @droettger and the link.

It’s a lot to take in, but I am very happy with OptiX 7 design.

Hi @droettger

Is there a way to combine a set of custom primitives with another AS such as the built-in triangles? Seems like you can’t combine different AS.

Thanks again.

Is there a way to combine a set of custom primitives with another AS such as the built-in triangles? Seems like you can’t combine different AS.

Correct, as said in the OptiX 7.3 Programming Guide chapter 5.1:
The optixAccelBuild function accepts multiple build inputs per call, but they must be all triangle inputs, all curve inputs, or all AABB inputs. Mixing build input types in a single geometry-AS is not allowed.

To combine different geometric primitive types into a scene, you must use an instance acceleration structure (IAS) over them where each instance holds one GAS, means a single level instance (IAS->GAS), where the IAS traversable handle is your scene root and used in optixTrace() calls
That IAS->GAS structure is what’s meant with OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING in the OptixPipelineCompileOptions::traversableGraphFlags field.

Please work through the OptiX SDK 7.3 examples as well.
There are multiple examples in there which show how to use built-in triangle primitives and custom primitives (e.g. spheres) together in one scene.
Search the source code for OptixInstance and look at the assigned traversableHandle e.g. in optixCutouts.cpp.

1 Like

Thank you @droettger - good information.