Unified memory

justin.decell · October 27, 2025, 3:15pm

Is unified memory supported in Optix applications? I found Optix 7.4 cudaMallocManaged from 2021 where the recommendation was to no use UVM because the behavior was not known; is this still accurate? I also searched the documentation and didn’t find any information.

Assuming the recommendation to not use unified memory still holds, does that recommendation extend to CUDA kernels within the same application (ie buffers only accessed via CUDA kernels)?

Thanks!

dhart · October 27, 2025, 6:03pm

Hi @justin.decell, welcome!

OptiX does allow you to use Unified Virtual Memory (UVM) for your own shader data and textures, but not for acceleration structures. OptiX does not allow UVM for the acceleration structure build via optixAccelBuild(). I think there might be an RTX traversal hardware requirement, but really the main reason is because using unified memory for acceleration structures in general, even when not using RTX GPUs, is known to cause severe performance degradation; ray traversal is memory intensive and involves chains of dependent loads per thread, so introducing multiple round trips over PCI during traversal can be catastrophic. We have some tests that show unified memory traversal running hundreds of times slower than VRAM traversal, so our recommendation is to keep acceleration structures strictly in VRAM(*).

For everything else, the choice to use UVM is yours to make, but I think @droettger’s recommendations from 2021 still apply today. You can use UVM, but you might compromise on performance, and you might compromise on portability somewhat. We have heard reports that some people are getting away with lower frequency coherent loads from unified memory for some kinds of shading data without having an unreasonable impact on performance, but there is still some impact on perf, so it’s a tradeoff.

* Note that having the accel structures in VRAM during the render kernel does not mean that all geometry must always be in VRAM. You can, for example, build a geometry streaming system that has proxy bounds in the acceleration structure and then in between render launches will load more geometry and update the BVH. That’s a big project and takes a lot of development time, I just wanted to clarify that I’m not ruling it out; when I say the BVH should be in GPU memory I only mean the BVH that’s accessed for the duration of a single launch.

–
David.

justin.decell · October 27, 2025, 6:33pm

Great, thank you for the info David!

system · November 10, 2025, 6:34pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Optix 7.4 cudaMallocManaged OptiX	3	845	June 14, 2022
Unified Memory On TX1 Jetson TX1	4	919	October 18, 2021
Unified Memory in CUDA 6 Technical Blog	87	2525	August 16, 2019
Why does it take longer for a program to use Unified Memory than not to use Uuified Memoery? Jetson AGX Xavier cuda	2	351	October 18, 2021
Unified Memory Access Performance of Arrays of Structures Problem on Jetson TX2 Jetson TX2 cuda	5	709	October 18, 2021
Unified Memory support in TensorRT TensorRT	8	1927	September 7, 2023
Trivial question on memory managed and unified memory. Legacy PGI Compilers	4	1499	June 21, 2024
unified memory with CUDA 8 CUDA Programming and Performance	7	3454	April 2, 2018
CUDA 6.5 Unified Memory (cudamallocmanaged) CUDA Programming and Performance	1	2215	February 18, 2015
Significant performance problem with Unified Memory based on driver version CUDA Programming and Performance	2	1467	July 31, 2018

Unified memory

Related topics