Hi there,
I’m currently developing an OptiX application to compute the flux of particles onto a triangulated surface using ray tracing and Monte Carlo sampling. The core idea is to simulate particles generated randomly on a source plane, trace them to the surface, and assign some weight to the triangle they hit. Afterward, the particle’s weight is reduced, and it is reflected off the surface. This process repeats until the particle’s weight falls below a defined threshold or it misses the surface entirely.
So far, I’ve successfully implemented this approach using a global GPU buffer with an entry for each surface triangle. In the __closesthit__
shader, I utilize the atomicAdd
function to increment the corresponding entry of the hit triangle. Here’s a minimal example of the __closesthit__
program:
void __closesthit__particle()
{
PerRayData *prd = (PerRayData *)getPRD<PerRayData>();
const unsigned int primID = optixGetPrimitiveIndex();
atomicAdd(¶ms.resultBuffer[primID], prd->rayWeight);
prd->rayWeight -= prd->rayWeight * params.sticking;
diffuseReflection(prd);
}
While researching this topic, I came across a similar idea in this forum: accumulating particle hits in a per-particle buffer and performing a reduction in a separate kernel to avoid relying on atomicAdd
.
This leads me to my question:
- Is using
atomicAdd
a poor choice for this kind of task due to performance concerns or potential drawbacks in scalability? - Would a per-particle buffer with subsequent reduction make more sense in this scenario?
- Are there other efficient methods to avoid or optimize the use of
atomicAdd
in this context?
Thank you in advance for your insights!
Best regards,
Tobias