optix prime (from 6.0) with cuda graph api and water tight option

If possible I would like to ask for your help in two issues:

(1)(i) While I set watertight option I have used the following bit to create my model, does this make sense? There are examples for C++ prime wrappers but for C-Style, I do wish to confirm:

RTPmodel model;
  CHK_PRIME(rtpModelCreate(context, &model));
  // Watertight option???
  const int builderMemoryMode = 0; //If I do not use 0 will it not be watertight? Is using 1 slow?
  CHK_PRIME(rtpModelSetBuilderParameter(model, RTP_BUILDER_PARAM_USE_CALLER_TRIANGLES,sizeof(int), &builderMemoryMode));
  CHK_PRIME(rtpModelSetTriangles(model, indicesDesc, verticesDesc));
  CHK_PRIME(rtpModelUpdate(model, 0));
  // ...
  CHK_PRIME(rtpQuerySetCudaStream(query, rtStream));
  // ...
  CHK_PRIME(rtpQueryExecute(query, RTP_QUERY_HINT_WATERTIGHT /* OR other hints */));

(1)(ii)Also if I use “0” instead of “1” in rtpModelSetBuilderParameter, will it still be watertight?

(2) Asking this question in CUDA forum may be more appropriate but this forum is quite more active, so please do forgive me if it is not the correct place to ask
(i) In order to limit my ray and shading buffer usage I am planning to use CUDA graphs (manually), should I call rtpQueryExecute in a host node or is there a better practice?
(ii) I do wish to update a state in host after executing the graph, would you guys recommend using cudaStreamAddCallback using the stream of graph or adding another host node at the end?

Best Regards,

Hi @yavuz.soy,

The header file is a little outdated, apologies. You can specify watertightness independently of whether you’re using caller triangles, so the query hint is all you need to turn it on. I believe watertight is off by default. Using the watertight option is a little slower than having it off, the OptiX Prime engineer in my office is estimating perhaps 5-10% slower, in return for the added watertightness guarantees.

While it should be relatively straightforward, we haven’t tried using CUDA graphs with OptiX prime. Both of your questions point at synchronizing the host with the CUDA execution. I don’t know what all the alternatives are to using host nodes and/or CUDA stream callbacks – you can probably use raw CUDA stream events in both cases if you want to, but I suppose that’s what CUDA graphs are there precisely to help you avoid ;). I would guess that host nodes are most appropriate if you need to sync in the middle of a graph execution, and a callback is most appropriate for after all nodes in the graph are complete. I don’t know but I doubt there is a significant performance difference, so if you need to host sync at the end, use whichever one is most convenient. You should indeed verify my speculation by asking on the CUDA forum, since I’m only guessing.

BTW, just so you are aware - in the new OptiX 7 API, built-in triangle intersection is always watertight, there’s no performance penalty for watertight triangle intersection when using RTX hardware, and using CUDA is a bit more seamless than Prime and a lot more seamless than OptiX 6. Since Prime doesn’t have support for RTX hardware, consider whether it makes sense to upgrade.


@dhart thank you for your answer

I have asked the part of my previous question regarding using CUDA Graphs to launch an OptiX Prime Query in CUDA Forums and I was notified that host nodes just like stream callback functions cannot launch other CUDA kernels, safely.

I would love to use OptiX 7, however, I am waiting for the known issues to be handled :)