Optix 6.5 Context Destroy/Recreation

Hi, I am currently porting our application to Optix 8.0, but meanwhile I also need to maintain the 6.5 version. Recently I ran into some issues when using the latter.

I apologize in advance for giving only limited information pertaining to our code, but hopefully it’s enough to shed some light on what’s going on.

Whenever our application begins a certain process, the Optix context is created, and then destroyed once the process is completed. When the process is initiated once more, new Optix context is created. At the same time, there is unrelated Cuda code that runs in parallel which I hope is not affected by the aforementioned manipulations. Detlef mentions the following in response to the post below:

OptiX 1 - 6 use the existing primary CUDA context on the device

  1. I wonder, does destroying Optix context happen to invalidate the primary Cuda context as well?

So far we haven’t had any issues, but we’ve been steadily increasing the complexity of our scenes. And recently we started having weird rendering artifacts (among other things).

Some of our Optix and Cuda kernels share texture objects (RTtexturesampler). The sampler is created via context->createTextureSampler()).

We bind it to optix rtTextureSampler via optix::Variable::set(sampler), whereas to use it in Cuda we simply pass sampler->getId() to the kernel, and inside the kernel cast the Id to cudaTextureObject_t so that it can be sampled from via tex3d.

As I mentioned, up until recently everything was fine. However now once the Optix context is recreated, the image turns black. I printed out the values that are sampled from the texture (cuda printf), and they always amount to zero, regardless of what I put inside the texture.

I then decided to query the texture object via cudaGetTextureObjectResourceViewDesc(struct cudaResourceViewDesc *pResViewDesc, cudaTextureObject_t texObject), which is a function in cuda_runtime_api.h. To be completely candid, I am not sure the values I am seeing in the output struct is what I expected. But having said that, the application does not crash in the first run. However, once again after context recreation, I am getting the following error:

Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyHtoDAsync( dstDevice, srcHost, byteCount, stream.get() ) returned (709): Context has been destroyed, file: <internal>, line: 0)

And I am not sure how that relates to the texture query I did above, but removing cudaGetTextureObjectResourceViewDesc also makes the error message disappear.

Does anyone happen to know what could be happening here? Thanks!

Hi @rastislav.starkov, welcome!

Will you elaborate a bit on what exactly is the order of events? When you say the OptiX process is completed, I assume you’re not referring to the OS process, but just that you call rtContextDestroy() in your app and then later call rtContextCreate() to begin doing more OptiX work, all during a single run of your application process?

If a texture sampler is created using the OptiX context and then passed to CUDA, are you assuming the sampler is destroyed during rtContextDestroy, or hoping that the sampler continues to exist after that? That is, do you re-create a new texture sampler after re-creating the OptiX context? Is the texture sampler potentially being used by CUDA during or after your call to rtContextDestroy()?

The design intent with OptiX 6.5 and earlier is that it does not corrupt the CUDA context, and it should support restarting the OptiX context. OptiX does, however attempt to clean up everything it knows about on a call to destroy the context. I don’t know enough yet to judge what’s wrong, but my first gut-reaction assumption with the crash is that maybe a handle to something OptiX created was used after the OptiX context got destroyed, so I would recommend reviewing whether there are any stale pointers or handles that need to be recreated once the context is recreated.


David.

Hi @dhart, thank you so much for the timely response!

Will you elaborate a bit on what exactly is the order of events? When you say the OptiX process is completed, I assume you’re not referring to the OS process, but just that you call rtContextDestroy() in your app and then later call rtContextCreate() to begin doing more OptiX work, all during a single run of your application process?

That’s exactly right, I am not referring to the OS process: context create/destroy takes place all during a single run of the application process. At runtime, the user can interact with the application in a way that drastically changes scene parameters, which warrants restarting the OptiX context (atm, it’s just simpler than keeping track of e.g. which buffers needs to be reallocated to accommodate the changes).

And so the order is as you described it: rtContextDestroy()rtContextCreate() → optix runs continuously unless restart is requested or the user terminates OptiX, all the while the application keeps working. Once OptiX is needed again, the context is created anew,

If a texture sampler is created using the OptiX context and then passed to CUDA, are you assuming the sampler is destroyed during rtContextDestroy , or hoping that the s ampler continues to exist after that? That is, do you re-create a new texture sampler after re-creating the OptiX context?

I am assuming that the sampler is destroyed along with the context, and I am re-creating new texture samplers.

Is the texture sampler potentially being used by CUDA during or after your call to rtContextDestroy()?

It’s a good point, and I think it’s worth double checking it. But my intent is for that not to happen.
CUDA is used immediately after Optix to postprocess some data, and then followed by cudaDeviceSynchronize. Then there is a few cudaMemcpyAsync calls after that, but once again followed by a device sync. Both rtContextDestroy() and the rest of OptiX and CUDA stuff happen on the same thread (I think), so my assumption was that by the time rtContextDestroy is called, cudaDeviceSynchronize had ensured that there aren’t any lingering kernels in-flight that might rely on data that is released by rtContextDestroy. Or is it redundant? Will rtContextDestroy implicitly sync the device and wait until all the Cuda processes are completed?

The design intent with OptiX 6.5 and earlier is that it does not corrupt the CUDA context, and it should support restarting the OptiX context. OptiX does, however attempt to clean up everything it knows about on a call to destroy the context. I don’t know enough yet to judge what’s wrong, but my first gut-reaction assumption with the crash is that maybe a handle to something OptiX created was used after the OptiX context got destroyed, so I would recommend reviewing whether there are any stale pointers or handles that need to be recreated once the context is recreated.

I was concerned that optix recreation was not possible at all, but it’s a relief that it is. I think your assumption on the nature of the crash is a good direction to look into, I’ll try to poke around and see if I missed anything. I might come back with more findings (hopefully with good news), if that’s ok. Thank you for your time!

Yes please do report any new findings, you are of course welcome to share your results, good or bad.

rtContextDestroy() does wait on any OptiX work to finish, but glancing at the code briefly, I don’t see it synchronizing other CUDA work on the stream or device. My guess is that it does not synchronize the device, as that could have negative consequences on things OptiX is neither responsible for nor aware of. So, you will want to experiment with your own stream or device synching before calling rtContextDestroy().

FWIW, there’s a CUDA environment variable (CUDA_LAUNCH_BLOCKING) that will cause all launches to behave synchronously. That might not yield anything, but it’s super easy to try, and if behavior does change then it’s a good clue that the code might be missing a synchronization point somewhere.


David.

1 Like

Hi @dhart, I am back with more findings and just one more question about Cuda/Optix interoperability.

Thanks for CUDA_LAUNCH_BLOCKING, I did not know about it, and it’s super helpful for debugging, but it looks like it isn’t synchronization that is at fault.

I think we’re misusing texture sampler IDs generated by OptiX. As I mentioned in the original post, we create OptiX texture samplers and use them in Optix, and their IDs in Cuda kernels, so we can hopefully sample from the same texture object.

/// host
optix::TextureSampler sampler = context->createTextureSampler();
// create backing rt buffer and assign to sampler
...
// bind sampler to device rtTextureSampler my_texture
context["my_texture"]->set(sampler);
int tex_id = sampler->getId(); // used in Cuda

/// optix kernel
rtTextureSampler<float, 3...> my_texture;
tex3D(my_texture...) // sample using sampler name

/// cuda
tex3D<float>((cudaTextureObject_t)tex_id...) // sample using sampler tex_id, hopefully sample from my_texture

Is there a guarantee that such OptiX sampler id, when used in Cuda, will map to the same texture object, as the one accessed by its sampler in OptiX? I ran a small experiment that suggests there isn’t.

Let’s say I have N OptiX texture samplers. I picked a sampler with a particular tex_id and placed some value in this texture that would help me identify it. It always appears to be exactly the same tex_id that is assigned to this particular texture sampler, whose texture I fill with my magical value.

Next, I attached Cuda debugger to the application, ran my Cuda kernel and saw the aforementioned value when sampling from the texture: tex3D<float>(tex_id...).

I then re-initialized the OptiX context (which includes texture sample regeneration), however this time sampling (once again in Cuda kernel) from the texture with the same tex_id yielded zero-values, which is the issue we’re having. Meanwhile, sampling in OptiX kernels still produces correct values.

Finally, I decided to try and shift the sampler tex_id I use in the cuda kernel by the total number of texture samplers created by Optix. Sampling with tex_id + N finally produced the hard-coded value. So it seems that restarting OptiX context didn’t break the samplers, only that the same texture objects in Optix and Cuda need to be accessed using different handles.

So I was just wondering if you happen to know if there is no guarantee that the same device-side texture sampler ID in Optix and Cuda will map to the same texture object? Granted, I am accessing texture objects in Optix using samplers, rather than texture ids, but I suppose I could do that using bindless textures.

If there is no guarantee, what is the right way to use the same texture in optix 6.5 and Cuda? I suppose I could potentially keep identical Optix and Cuda textures CUDA C++ Programming Guide. I take it I would anyhow have to convert to the latter when updating to Optix 8.0.

Thank you so much for your time! I am sure all of that could have been summarized much nicer, I just wanted to be thorough.

So I was just wondering if you happen to know if there is no guarantee that the same device-side texture sampler ID in Optix and Cuda will map to the same texture object?

Sorry I didn’t catch this earlier. It turns out your suspicion is correct: OptiX sampler IDs and CUDA sampler IDs aren’t compatible. I’ve learned that OptiX carries data about the sampler that CUDA doesn’t have, and we didn’t intend for the sampler to be shareable. I guess it’s just lucky that it works the first time. I’m told you can still share the buffer data of your textures, so I think our recommendation is to create samplers on both sides. Is that a viable workaround?


David.

Hi David. Yes, that’s a perfectly viable workaround. Thank so much!

1 Like