7.2 create geometry performance question

I am just wondering if I am on the right track or if there is anything major I am missing to improve performance.

I am currently passing large arrays containing vertices and indices for 3k geometries from C# to a dll that runs Optix.

Just the process of adding all the geometry takes a bit over one second.

This includes allocating and copying the verts/idx for each geometry, creating the triangle input, computing the memory usage and finally building the acceleration. I do need the know later on which geometry is hit, so each of the 3k needs to be it’s own entity. (very similar to buildGeomAccel() from the cutouts example)

All in all it’s about 15 million verts and 20 million indices.

My question now is: is this in the range you’d expect or am I far enough off to bother diving deeper?

Hey @tjaenichen,

It’s a little hard to say from the description. Are you passing a pointer from C# to OptiX, or do you have to copy these arrays on the host?

Are you using mutiple CUDA streams for your optixAccelBuilds? If not, you may be able to see some speedups by using 2-4 streams and building accels in parallel.

Are you doing any allocation, compute, or transfer that is per-geom, aside from the opticAccelBuild() call?

I’d recommend profiling it with Nsight Systems so that you can break down the time to copy the data from host to GPU, and separately the time to build accels. Then it’s easier to calculate the transfer in bytes per second and the accel builds in triangles per second, and compare to the known throughput rates. PCIe transfer rates are well know, and GPU accel build rates for large single accel builds have peak rates in the low hundreds of millions of tris per second for Turing & Ampere GPUs. For lots of small builds, the overheads (e.g. kernel launch) may dominate and yield much lower than peak throughput. This is when using several streams can pay off.


Hi David,

thanks a lot! To answer your questions, I am just passing the pointer to the array around. So the copying should just be from the host to device. I am however only using one stream at a time. I will have a look at using multiple. There are some restrictions when it comes to Unity and multi threading, so I thought about multi threading later on with the lower hanging fruit out of the way.

All the geometry is also in world space at this point, so no more transfers or anything.

Thanks for in answers and the ballpark figures, this will really help guiding.