In my application scenario, due to the large amount of data, I can not build a whole BVH tree because of the limited GPU memory. So I want to build multiple BVH trees and put the ones that are not used temporarily on the hard disk. I want to know whether BVH can be serialized to SSD.
You can just copy the AS memory back to the host (it’s just some CUDA memory you allocated yourself after all) and dump it to disk, then later load it again and fix up any pointers it contains using optixAccelRelocate .
Though you should first check whether it’s compatible with the current device using optixCheckRelocationCompatibility (by also storing the struct that was retrieved with optixAccelGetRelocationInfo together with the AS memory), since as mentioned it’s device- and version-dependent!
Loading these from disk is most likely slower than building them. The only reason to do this is when there is not enough temporary memory to build the original AS and compacting it at runtime. Loading the compacted AS from disk would then only take up that allocation.
What if the BVH is not stored on the disk but temporarily stored in the host memory? That is, copy some BVHs to host memory when they are not needed, and then copy them to device memory and relocate them when needed. At this time, the main overhead includes copying the BVHs to the host memory, copying the temporary BVHs from the host memory to the device memory, and relocating. In general, is it less time-consuming than building BVH directly?
Yes, that should be faster than copying the original vertex data and building and compacting a GAS since that would copy a lot more data around and call multiple kernels.
Still whenever the CUDA allocation address changes from where it originally resided in device memory, you must call optixAccelRelocate to patch the absolute pointers stored inside them. I never used that in an application, but that should be quick.
Thank you very much for your quick reply! I have read the link carefully, but I still have the question.
What if we don’t consider the process of copying the original vertex data? That is, we assume that all original vertex data exists in the device memory, and the above two methods only operate BVH(Building BVH directly only means building and compacting a GAS, excluding copying the original vertex data). In this case, which of the two methods mentioned above is faster?
With “copying the vertex data” I was referring to the vertex position input to the optixAccelBuild required to build the initial GAS. That needs to be copied to the device at least once.
If you build a GAS with the OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS flag, you can also query the vertex position data with optixGetTriangleVertexData on the device, but that will increase the size of the GAS and is normally slower than fetching the vertex position attribute from some global memory along with all the other vertex attributes which you always have to store yourself anyway. https://raytracing-docs.nvidia.com/optix7/guide/index.html#device_side_functions#vertex-random-access
If you’re doing this because of VRAM memory constraints, then it would make sense to also purge the vertex attributes of the unused GAS
If you swap GAS in an out to host RAM but hold all your scene’s vertex attribute data inside the device memory at all times, then the difference is between the bandwidth your system reaches for the host to device copy via PCI-E and the performance of the acceleration structure build and optional compaction kernels.
That in turn is highly dependent on your system setup and I cannot say what is faster. I never did that measurement.
Mind that the memory bandwidth for VRAM is in the hundreds of GB while PCI-E Gen 3 bandwidth is 16 GB max and most system don’t reach that.