The first thing would be to try optimizing the occlusion rays.
The OptiX 7.4.0 optixPathTracer is only handling opaque materials (no cutout opacity) so an occlusion ray would not require a closesthit or anyhit program but only a hardcoded miss program, but the example code uses a closesthit program instead.
Compare the optixPathTracer occlusion implementation with this code:
https://forums.developer.nvidia.com/t/anyhit-program-as-shadow-ray-with-optix-7-2/181312/2
It’s probably not a big difference since either reaching the closesthit or the miss program ends the current ray and calls back into the streaming multiprocessors anyway and both only set a single payload register.
There is no need for that inlined setPayloadOcclusion() function either, but that should also collapse to the same assembly, it’s just unnecessary code in my opinion.
When using the optixPathTracer as basis for your code, are you still using the ray generation program with the for-loop over the number of samples?
I would change that to sample only one path per launch index then.
Are you actually tracing a full path of radiance rays through the scene for the lighting and then do the same again and sample all 200+ on each closest hit event along any path?
Even if the occlusion/visibility rays described inside the link above are the fastest rays you can have because they stay completely on the hardware RT cores until they terminate, this still sounds like an absurd amount of total rays per launch.
Have you counted how many rays you’re actually shooting per launch in your scene?
Is the sound simulation running at the same resolution as the radiance simulation?
The simplest approach to speed this up would obviously be using faster hardware (depending on what you’re currently using) and distributing the workload to multiple GPUs.
If you need final “images” of your sound simulation and you’re using a progressive Monte Carlo algorithm to solve the sound propagation similar to lighting, there are basically the two ways of launching once and simulating every source versus launching more often and picking only a subset of the sources (at least 1).
In the end, the number of rays shot to accumulate a sufficiently low variance result won’t change with those approaches.
If you say picking only nearer sound sources for the sampling didn’t help either, what does that mean?
You didn’t get a low enough variance to be used for you training input, or that required more iterations to get to sufficient result?
For lighting calculations there are some ways to reduce this variance, for example by using multiple-importance sampling. Not sure how the material (BRDF) behavior would change for sound wavelength though.
If you’re only using perfectly diffuse reflection, what is all the optixPathTracer example handles, that might just work, but is theoretically incorrect for sound which would behave very different when hitting geometry smaller than its wavelengths.
There are also algorithms which help with many light sampling like ReSTIR, but I don’t know if that can easily be applied to your sound sources.
There are also different light transport algorithms like bi-directional path tracing or photon mapping and the combination of these in Vertex Connection and Merging algorithms which can solve many problematic light transport events.