Best strategy for splatting image for bidir

I’m planning adding bidir support to my renderer and I’m wondering what the best strategy for splatting light paths to the image buffer is given that they’re random and two threads could be wanting to splat to the same pixel concurrently?

Scatter algorithms like that require atomics.
Something like this (from one of my attempts connecting a hit with a camera raster position).

...
  if (isNotNull(contrib))
  {
    // x, y are the projected raster coordinates on the film area.
    // The outputBuffer is a CUdeviceptr to allow different formats.
    float* buffer = reinterpret_cast<float*>(sysData.outputBuffer) + (y * sysData.resolution.x + x) * 4; // RGBA32F
   
    // This is a scattered write and needs atomics.
    // The buffer needs to be initialized with zero before the first launch, see cuMemsetD32Async().
    // Also means multi-GPU implementations need full resolution local device buffers for final compositing!
    // The composited frame buffer needs to be scaled by the inverse of the number of iterations at the very end.

    atomicAdd(buffer    , contrib.x);
    atomicAdd(buffer + 1, contrib.y);
    atomicAdd(buffer + 2, contrib.z);
  }

You need to be extra careful with multi-GPU in that respect or there could be serious performance cliffs. Best to only use atomics in local VRAM on the GPU device the instruction is executed on. Avoid shared memory for that like pinned memory or a single peer-to-peer buffer with NVLINK. (Should be no problem to use peer-to-peer via NVLINK for the final compositing though.)

Thanks Detlef!