optixAccelBuild of an empty scene takes 1.2 GB of dedicated GPU memory on RTX 5000 ADA

Hello. I have measured and compared the GPU memory allocated by a simple program that performing a basic Optix initialization and one IAS build with a single instance and no geometries at all. This program allocated 0.6 GB of dedicated GPU memory on an RTX A5000, and 1.2 GB (the double) on an RTX 5000 ADA.

I provide below the code that I have used:

CUstream stream{};
OptixDeviceContext optixContext{};

cudaSetDevice(0);
cudaFree(0);
cudaStreamCreate(&stream);

optixInit();
optixDeviceContextCreate(nullptr, 0, &optixContext);

OptixInstance optixInstance{};
optixInstance.visibilityMask = 255;

OptixAccelBuildOptions iasBuildOptions{};
OptixBuildInput iasBuildInput{};

CUdeviceptr instanceBuffer;
cudaMalloc((void**)&instanceBuffer, sizeof(OptixInstance))
cudaMemcpy((void*)instanceBuffer, &optixInstance, sizeof(OptixInstance), cudaMemcpyHostToDevice)

iasBuildInput.type = OPTIX_BUILD_INPUT_TYPE_INSTANCES;
iasBuildInput.instanceArray.instances = instanceBuffer;
iasBuildInput.instanceArray.numInstances = 1;

iasBuildOptions.buildFlags = OPTIX_BUILD_FLAG_ALLOW_UPDATE;
iasBuildOptions.motionOptions.numKeys = 1;
iasBuildOptions.operation = OPTIX_BUILD_OPERATION_BUILD;

OptixAccelBufferSizes iasBufferSizes;
optixAccelComputeMemoryUsage(optixContext, &iasBuildOptions, &iasBuildInput, 1, &iasBufferSizes));

CUdeviceptr iasBuildTempBuffer;
cudaMalloc((void**)&iasBuildTempBuffer, iasBufferSizes.tempSizeInBytes)

CUdeviceptr iasBuffer;
cudaMalloc((void**)&iasBuffer, iasBufferSizes.outputSizeInBytes)

OptixTraversableHandle iasHandle{};
optixAccelBuild(optixContext, stream, &iasBuildOptions, &iasBuildInput, 1, iasBuildTempBuffer, iasBufferSizes.tempSizeInBytes, iasBuffer, iasBufferSizes.outputSizeInBytes, &iasHandle, nullptr, 0u));

Just before the call to optixAccelBuild, the GPU memory taken is about 0.3 GB on both cards. But just after the call, the GPU memory taken is much more on the ADA GPU. Do you have an explanation of this pretty high memory consumption for the ADA GPU, or do you know if this could be fixed with more recent versions of Optix, Cuda, or drivers updates? I am using Optix 7.3, Cuda Toolkit 11.8, and latest drivers.

Hi @claude.perin, welcome!

There was a thread about this earlier this year: Understanding OptiX internal memory use - #2 by dhart

This is documented CUDA behavior - the allocation you’re seeing is the space needed for a kernel’s local memory, and the allocation is ‘sticky’ meaning it does not go away when the kernel exits. The OptiX BVH builder runs it’s own kernels, which is why it appears to result in an allocation. BVH builds tend to be the first kernels run in an OptiX application and so is the most easily/often implicated. If you ran a different CUDA kernel before building the BVH, then you might see more of the memory usage attributed to your own kernel. The OptiX BVH build has some non-trivial stack/lmem usage that depends on how many CUDA cores your GPU has and not on the size of the BVH itself, which is part of why the usage might appear surprisingly large.

The amount of local memory needed in the OptiX BVH builder was reduced starting in the 560 driver, so if you haven’t yet tried 560, you can install it and see if the memory usage appears lower.


David.

1 Like

Hi @claude.perin, you might also try OPTIX_BUILD_FLAG_ALLOW_COMPACTION in the OptixAccelBuildOptions, assuming you’re not rebuilding often.

Leonardo

Thanks @dhart. The latest feature-branch drivers (560), as you suggested, fixed the ADA overconsumption issue: now my small Optix sample takes the same memory (0.6 GB) on both Ampere and ADA GPUs.