What is the right time to release device memory after building AS structure in Optix7?

Altmice · October 22, 2019, 3:46am

Hello! I’m writing a sample application with 2 GAS following the instruction of official programming guide.
In chapter 5 “Acceleration structures”, it says that

“The AS constructed by optixAccelBuild does not reference any of the device buffers referenced in the build inputs. All relevant data is copied from these buffers into the acceleration output buffer, possibly in a different format.
The application is free to release this memory after the build without invalidating the acceleration structure. However, IAS builds do continue to refer to other IAS and GAS instances and transform nodes.”

It seems like that the user can release the relevant device buffer as long as the OptixAccelBuild for GAS has been called.
My sample IAS is composed of 2 GAS spheres. One sphere is based on triangle meshes and the other is based on AABB. However, I found that the aabb device buffer must not be released before building the IAS. If I release the memory of aabb array soon after building the aabb GAS, the primitive will not be traversed and disappear from the result.
The situation of triangle primitive is a bit different. If I release device memory after building the triangle GAS, the app works well with a small number of triangles. Illegal memory access will be encounted if I increase the amount of triangle to about 7000. Referenced device memory must be reserved unless the application is destroyed.

So, what is the right time to release the device memory referenced in build inputs? Large memory consumption will be encountered to reserve these memory.

dhart · October 22, 2019, 5:02pm

Hi Altmice,

I’m not sure what’s happening in your case, but I can confirm that freeing your AABB buffer immediately after calling optixAccelBuild() is allowed and should work correctly. Our SDK samples, optixWhitted_exp for example, releases the AABB buffer immediately after it’s used.

To make sure I understand completely, there are at least 4 separate device buffers involved in building a GAS for custom primitives…
1: primitive data (like center & radius for spheres. this goes in the SBT, not the GAS)
2: aabb data (bounding boxes)
3: temp GAS buffer
4: output GAS buffer

You are trying to delete only buffer #2, just the boxes, correct?

I’m not sure I understand about the triangles. Are you using built-in triangles? If you’re using built-in triangles, then you don’t allocate an AABB buffer for them, OptiX handles computing bounds for you, right? There shouldn’t be any AABB device buffer to release in that case. Or are you using a custom CUDA triangle intersection program?

In the case of built-in triangles, you are allowed to delete buffer #1, the triangle vertex data. If you’re deleting the sphere primitive data (the buffer with radius & center) when using custom primitives, then that would cause an illegal memory access. Is it possible that’s what is happening?

If you confirm that your program is crashing after freeing only the AABB buffer of your sphere, we would be interested in a minimal reproducer. Would you be able to put together the smallest amount of code it takes to demonstrate the issue and send it to the optix-help list?

–
David.

Altmice · October 23, 2019, 3:15am

Thanks for your clear explanation!

I know what’s controversial here. In fact I’m trying to delete buffer#4 after building GAS or IAS. It seems that the data of buffer#2 are packed and copied to output GAS buffer or IAS buffer.

void buildAabbGas() {
   CUdeviceptr d_aabb_buffer;  //buffer #AABB_2
   OptixBuildInput buildinput;
   // Setup buildinputs
   ...
   buildinput.aabbArray.aabbBuffers=&d_aabb_buffer;
   // Alloc buffer for GAS build
   CUdeviceptr output_buffer;   //Buffer #AABB_4
   CUdeviceptr temp_buffer;     //Buffer #AABB_3
   optixAccelBuild(context,0,&accel_options,&buildInput,1,temp_buffer,temp_buffer_SizeInBytes,
                      output_buffer,output_buffer_SizeInBytes,aabb_handle,nullptr,1);

   cudaFree(reinterpret_cast<void*>(d_aabb_buffer));
   cudaFree(reinterpret_cast<void*>(output_buffer));//Release Buffer #AABB_4
   cudaFree(reinterpret_cast<void*>(temp_buffer));
}
void buildTriangleGas() {
   CUdeviceptr d_tri_buffer;  //buffer #TRI_1
   OptixBuildInput buildinput;
   //Set up the buildinput data
   buildinput.triangleArray.vertexBuffers=&d_tri_buffer;
   //Alloc buffer for GAS build
   CUdeviceptr output_buffer;   //Buffer #TRI_4
   CUdeviceptr temp_buffer;     //Buffer #TRI_3
   optixAccelBuild(context,0,&accel_options,&buildInput,1,temp_buffer,temp_buffer_SizeInBytes,
                      output_buffer,output_buffer_SizeInBytes,tri_handle,nullptr,1);

   cudaFree(reinterpret_cast<void*>(d_tri_buffer));
   cudaFree(reinterpret_cast<void*>(output_buffer));//Release Buffer #TRI_4
   cudaFree(reinterpret_cast<void*>(temp_buffer));
}
void buildIas() {
   //Build GAS
   buildAabbGas();
   buildTriangleGas();
   //Setup host OptixInstance
   OptixInstance optix_instance[2];
   optix_instance[0].traversableHandle = tri_handle;
   optix_instance[1].traversableHandle = aabb_handle;
   CUdeviceptr instances_buffer;   //buffer #IAS_2
   //copy host OptixInstance to deviceptr
   ...
   //Setup buildinputs
   OptixBuildInput instance_input;
   instance_input.instanceArray=instances_buffer;
   //Alloc buffer for IAS build
   CUdeviceptr output_buffer;       //buffer #IAS_4
   CUdeviceptr temp_buffer;         //buffer #IAS_3
   optixAccelBuild(context,0,&accel_options,&instance_input,1,temp_buffer,temp_buffer_SizeInBytes,
                      output_buffer,output_buffer_SizeInBytes,ias_handle,nullptr,1);
   cudaFree(reinterpret_cast<void*>(instances_buffer));
   cudaFree(reinterpret_cast<void*>(output_buffer));//Release Buffer #IAS_4
   cudaFree(reinterpret_cast<void*>(temp_buffer));
}

Now too many buffers are involved to build an IAS. Temp buffers and data buffers can be released after building the corresponding GAS or IAS. But I’m unknown about the time to release these output buffers. Several cases of my codes are presented here:

The output buffer of AABB GAS can be released safely after building IAS in buildIas() function.
The output buffer of IAS can be released after building IAS in buildIas() function.
The output buffer of AABB IAS can not be released immediately after building GAS in buildAABBGas() function. Otherwise the aabb primitive will disapper.
Releasing the output buffer of triangle GAS at the end of buildTriangleGas() or buildIas() is risky. The app can work well with a low number of vertices. Illegal memory access is encountered when I increased the number of vertices.

optixCutouts_exp provides a solution that all the output buffers are released in cleanupState() function. I’m wondering if we can make it earlier to save device memory usage.

droettger · October 23, 2019, 12:23pm

Wait, the outputBuffer device memory in an optixAccellBuild() call receives the acceleration structure data associated with the outputHandle.
You must not release any of these device buffers as long as that AS is in use!
Means in your code above, comment out lines 14, 29, and 52. Instead track the outputBuffer pointers and delete them when the resp. AS is not used inside the scene anymore.

Line 11, 26, 50: Why do you have numEmittedProperties == 1 while emittedProperties == nullptr?

optixAccelBuild() is an asynchronous CUDA launch. While cudaFree() is synchronous, when building AS in parallel with multiple streams you would need to make sure the correct stream is synchronized with cudaSynchronizeStream(your_stream) before deleting the memory.

You should add checks around all OptiX and CUDA calls to verify that they succeeded. Helps debugging tremendously.

Altmice · October 23, 2019, 2:08pm

Thank you,Detlef! Your advice is always precise and timely.

I will keep these output buffers in memory until the IAS will not be used. What puzzles me is that even if I release these buffers by mistake, sometimes the app can still work well. In Optix6, users have less concern about the memory manangement of AS builds and SBT.

numEmittedProperties should be 0 here in line 11,26,50. Mistakes are made when I try to transcribe the code from my notebook. Check marcos are also omitted here for convenience.

Currently I’m running the app with only one stream. I’ll keep in mind to check the stream synchronization.

Altmice · October 23, 2019, 2:12pm

Upgrating my code from Optix 6 to Optix 7 spends much more effort than I expected, but it is worth to make full use of RT cores.

dhart · October 23, 2019, 3:02pm

What puzzles me is that even if I release these buffers by mistake, sometimes the app can still work well.

Yes, this is true on CPU too – using memory that has recently been freed is not guaranteed to always crash or trigger an illegal memory access exception. But it’s always unsafe and always incorrect to use already released memory.

Temp buffers and data buffers can be released after building the corresponding GAS or IAS.

A couple of notes about this:

The best practice for temp buffers is to consolidate them when building many acceleration structures. cudaMalloc() and cudaFree() are expensive calls, so if you can, allocate a temp buffer large enough for all your AS builds, and reuse it for every call to optixAccelBuild(). You can then release your temp buffer after the last AS build has finished but before calling optixLaunch(). For many AS builds, we’d recommend building some kind of ring buffer to use as temp space.

Your data buffer (#1) can only be released after building the corresponding AS when your data is built-in triangles. If your data is custom primitives associated with an intersection program, you will need to keep the data buffer alive until after tracing rays, until after your last optixLaunch() call has completed.

–
David.