optixAccelBuild of an empty scene takes 1.2 GB of dedicated GPU memory on RTX 5000 ADA

claude.perin · August 30, 2024, 3:51pm

Hello. I have measured and compared the GPU memory allocated by a simple program that performing a basic Optix initialization and one IAS build with a single instance and no geometries at all. This program allocated 0.6 GB of dedicated GPU memory on an RTX A5000, and 1.2 GB (the double) on an RTX 5000 ADA.

I provide below the code that I have used:

CUstream stream{};
OptixDeviceContext optixContext{};

cudaSetDevice(0);
cudaFree(0);
cudaStreamCreate(&stream);

optixInit();
optixDeviceContextCreate(nullptr, 0, &optixContext);

OptixInstance optixInstance{};
optixInstance.visibilityMask = 255;

OptixAccelBuildOptions iasBuildOptions{};
OptixBuildInput iasBuildInput{};

CUdeviceptr instanceBuffer;
cudaMalloc((void**)&instanceBuffer, sizeof(OptixInstance))
cudaMemcpy((void*)instanceBuffer, &optixInstance, sizeof(OptixInstance), cudaMemcpyHostToDevice)

iasBuildInput.type = OPTIX_BUILD_INPUT_TYPE_INSTANCES;
iasBuildInput.instanceArray.instances = instanceBuffer;
iasBuildInput.instanceArray.numInstances = 1;

iasBuildOptions.buildFlags = OPTIX_BUILD_FLAG_ALLOW_UPDATE;
iasBuildOptions.motionOptions.numKeys = 1;
iasBuildOptions.operation = OPTIX_BUILD_OPERATION_BUILD;

OptixAccelBufferSizes iasBufferSizes;
optixAccelComputeMemoryUsage(optixContext, &iasBuildOptions, &iasBuildInput, 1, &iasBufferSizes));

CUdeviceptr iasBuildTempBuffer;
cudaMalloc((void**)&iasBuildTempBuffer, iasBufferSizes.tempSizeInBytes)

CUdeviceptr iasBuffer;
cudaMalloc((void**)&iasBuffer, iasBufferSizes.outputSizeInBytes)

OptixTraversableHandle iasHandle{};
optixAccelBuild(optixContext, stream, &iasBuildOptions, &iasBuildInput, 1, iasBuildTempBuffer, iasBufferSizes.tempSizeInBytes, iasBuffer, iasBufferSizes.outputSizeInBytes, &iasHandle, nullptr, 0u));

Just before the call to optixAccelBuild, the GPU memory taken is about 0.3 GB on both cards. But just after the call, the GPU memory taken is much more on the ADA GPU. Do you have an explanation of this pretty high memory consumption for the ADA GPU, or do you know if this could be fixed with more recent versions of Optix, Cuda, or drivers updates? I am using Optix 7.3, Cuda Toolkit 11.8, and latest drivers.

dhart · August 30, 2024, 4:58pm

Hi @claude.perin, welcome!

There was a thread about this earlier this year: Understanding OptiX internal memory use - #2 by dhart

This is documented CUDA behavior - the allocation you’re seeing is the space needed for a kernel’s local memory, and the allocation is ‘sticky’ meaning it does not go away when the kernel exits. The OptiX BVH builder runs it’s own kernels, which is why it appears to result in an allocation. BVH builds tend to be the first kernels run in an OptiX application and so is the most easily/often implicated. If you ran a different CUDA kernel before building the BVH, then you might see more of the memory usage attributed to your own kernel. The OptiX BVH build has some non-trivial stack/lmem usage that depends on how many CUDA cores your GPU has and not on the size of the BVH itself, which is part of why the usage might appear surprisingly large.

The amount of local memory needed in the OptiX BVH builder was reduced starting in the 560 driver, so if you haven’t yet tried 560, you can install it and see if the memory usage appears lower.

–
David.

lspano · August 31, 2024, 11:15am

Hi @claude.perin, you might also try OPTIX_BUILD_FLAG_ALLOW_COMPACTION in the OptixAccelBuildOptions, assuming you’re not rebuilding often.

Leonardo

claude.perin · September 12, 2024, 7:18am

Thanks @dhart. The latest feature-branch drivers (560), as you suggested, fixed the ADA overconsumption issue: now my small Optix sample takes the same memory (0.6 GB) on both Ampere and ADA GPUs.

system · September 26, 2024, 7:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Understanding OptiX internal memory use OptiX optix	4	776	December 20, 2023
OPTIX, memory allocated on the cpu side OptiX	5	1029	June 15, 2022
OptiX launch dimension inquiry OptiX	4	548	December 18, 2023
Is there a way to know how much GPU memory Optix will use? OptiX	2	659	November 14, 2023
Optix 6.5 Demo Performance Concern OptiX hw , cuda	6	1551	October 12, 2021
Should OptiX context destruction lead to deallocation of used VRAM? OptiX	3	726	June 14, 2022
OptiX Time for Launch OptiX	9	1339	June 14, 2022
In Optix, How much memory can a single optixLaunch allocate? OptiX cuda , optix	3	606	January 18, 2024
optixAccelBuild sometimes takes very long OptiX ray-tracing , optix	5	1000	May 19, 2023
OPTIX, acceleration structure requires too much space OptiX	10	2661	June 15, 2022

optixAccelBuild of an empty scene takes 1.2 GB of dedicated GPU memory on RTX 5000 ADA

Related topics