Unified memory

Is unified memory supported in Optix applications? I found Optix 7.4 cudaMallocManaged from 2021 where the recommendation was to no use UVM because the behavior was not known; is this still accurate? I also searched the documentation and didn’t find any information.

Assuming the recommendation to not use unified memory still holds, does that recommendation extend to CUDA kernels within the same application (ie buffers only accessed via CUDA kernels)?

Thanks!

Hi @justin.decell, welcome!

OptiX does allow you to use Unified Virtual Memory (UVM) for your own shader data and textures, but not for acceleration structures. OptiX does not allow UVM for the acceleration structure build via optixAccelBuild(). I think there might be an RTX traversal hardware requirement, but really the main reason is because using unified memory for acceleration structures in general, even when not using RTX GPUs, is known to cause severe performance degradation; ray traversal is memory intensive and involves chains of dependent loads per thread, so introducing multiple round trips over PCI during traversal can be catastrophic. We have some tests that show unified memory traversal running hundreds of times slower than VRAM traversal, so our recommendation is to keep acceleration structures strictly in VRAM(*).

For everything else, the choice to use UVM is yours to make, but I think @droettger’s recommendations from 2021 still apply today. You can use UVM, but you might compromise on performance, and you might compromise on portability somewhat. We have heard reports that some people are getting away with lower frequency coherent loads from unified memory for some kinds of shading data without having an unreasonable impact on performance, but there is still some impact on perf, so it’s a tradeoff.

* Note that having the accel structures in VRAM during the render kernel does not mean that all geometry must always be in VRAM. You can, for example, build a geometry streaming system that has proxy bounds in the acceleration structure and then in between render launches will load more geometry and update the BVH. That’s a big project and takes a lot of development time, I just wanted to clarify that I’m not ruling it out; when I say the BVH should be in GPU memory I only mean the BVH that’s accessed for the duration of a single launch.


David.

Great, thank you for the info David!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.