OPTIX, memory allocated on the cpu side

Dear all,

It seems that optixAccelBuild is making use of cpu memory. And it allocates roughly the same amount of memory it alloces in gpu. Is this expected? And if yes, how do I tell optix not to use cpu memory?

Warm Regards,

Hi Panos,

optixAccelBuild itself does not use very much CPU memory. It might help to understand how you are measuring your CPU memory usage, and what your GPU configuration is. Specifically, are you using your display GPU to run OptiX, and are you using Windows?

What you may be seeing is that sometimes the OS requires enough CPU virtual memory in reserve in order to safely swap or page out any device allocations. This is for display-critical safety, so that your OptiX or CUDA programs can’t cause your OS to be unable to allocate any memory and have your display crash or shut down. This means that it’s probably the cudaMalloc() call that is appearing to consume host memory, and not the optixAccelBuild() call, which you should be able to verify by skipping the build call but running a different kernel over your allocated memory.


Thanks for the reply dhart.

Yes, I am using my display gpu (Quadro M2200 notebook gpu) and I am using Windows. I measure cpu peak memory usage by making use of the GetProcessMemoryInfo windows function. If I see a memory peak bump in two consecutive “memoryInfo” calls, then there is a problem in the code in between. I do set CUDA_LAUNCH_BLOCKING to 1, so I expect the cuda calls to be synchronous.

I do have cudaMalloc() in the code before optixAccelBuild which might explain what I see. However, I see no memory bumps. It is only when optixAccelBuild gets called that I see cpu memory consumption.

I guess CUDA_LAUNCH_BLOCKING does not make things so sychronously after all. :)

I’m no expert on Windows virtual memory as it relates to GPU allocations, but I would mainly say be really careful about making assumptions about how it works under the hood, especially when mixing CUDA calls and Windows OS calls. For example, I wouldn’t assume that the OS mem info reports are telling you anything about whether CUDA launches are synchronous or not, even if the memory changes appear to happen outside of your logical range between launches. You can instead verify synchronous launch behavior with CUDA_LAUNCH_BLOCKING using Nsight Systems. (You can also cross-check your memory usage metrics with Nsight Systems too…)

BTW, which values are you looking at from the results of your GetProcessMemoryInfo() call? Looking at the docs, it seems like this call only returns values related to memory that has been used/referenced, and does not tell you about how much memory has been requested, is that correct? PROCESS_MEMORY_COUNTERS_EX (psapi.h) - Win32 apps | Microsoft Docs

The same caveats are true these days with pure CPU virtual memory as well; you will often see physical RAM backing get reserved lazily, only when the memory is actually referenced, rather than at allocation time.

These reasons are why I suggested using a different kernel of your own making, other than the kernels launched by optixAccelBuild(). A simple test might even be to call cudaMalloc(), and then see what happens when you then run your own kernel that writes a value into every byte of your device buffer, or even compare to cudaMemset().