Extremely poor performance of vkCmdTraceRaysKHR

Hello,
Each frame, I generate some geometry via compute shaders and alternatively on the CPU. This results in about 10.000 triangles, the number is variable. For this reason, my vertex&index buffers are created to fit a (arbitrarily chosen) number of 50.000 triangles. Before generating the geometry, I overwrite the vertex buffer with 0xFF (resulting in NaN in all floating point values) which to my understanding causes the acceleration structure build to ignore the extra buffer space. After this, the BLAS is built with update = VK_TRUE. This may be inefficient for my situation, but I don’t know how to completely rebuild the blas, as it will always still be in use by frames in flight…

NSight Graphics frame profiler tells me that vkCmdTraceRaysKHR takes about 100 ms (frame time measured by my application is around 90ms though). Top SOL% is TEX at around 17%, the rest of the stats read similarly pathological, I will gladly provide them if it helps.

I am stumped, I have no idea where I lose this much performance. I thought it might be some kind of cache thrashing so I sorted the vertices in a way that should improve spacial coherence, to no effect. I’ve simplified the closest hit shader to payload=vec3(1.0), which of course improves performance, but the pathology seems to remain. I have stopped generating new geometry (no more vkDispatch calls) after a few frames, waited, then captured a frame with the same result.

The strange thing is, when generating visually the same geometry on the CPU, the profiler shows traceRays to only take around 3 ms (naturally generating the geometry now takes 150ms or so, which is impractical)

Could it be the acceleration structure? How do I properly completely rebuild it without getting errors as it is still in use by another command buffer? (my renderer is based on the one in vulkan-tutorial.com, using a command buffer per swapchain image etc…)

I had some validation layer errors before, when I set the update flag to VK_FALSE in my acceleration structure updating function. Now that I’ve tried it again, it started to not only work without errors but also completely solve the described issue. So I suppose in my situation, updating the acceleration structure was simply resulting in extremely inefficient structures. As such, this problem is solved for me.