Weird behavior of atomicAdd in Optix

Optix is said to be per-ray model, so in examples __raygen__rg() function calls are nicely isolated in their memory access by the pixel position.
I’m trying to implement some global statistics. I changed the image buffer into stats buffer; the access is now overlapping, so I clean the buffer on creation and use atomicAdd(float*, float) calls.
This leads to weird results, though: I get NaNs - that is, until I try checking the incoming values explicitly using isnan(), in which case it magically starts working fine. Looks like some kind of a race condition. I there some obvious mistake that I’m making?

How exactly are you clearing that stats buffer on the device?
With some explicit kernel launch or one of the cuMemset calls (recommended) or some host to device memcopy ?

Are you using an asynchronous call for that?
Is that using the same CUDA stream as the following optixLaunch?

In that case the following optixLaunch should find the initialized values inside the device buffer and the first atomicAdd should always get the defined clear result.

This should usually work. I’ve done this before for scattered color accumulations.

It will obviously not work if buffer entries are cleared in the same optixLaunch call which do the atomicAdd because the single ray-programming model doesn’t allow any assumptions about neighboring launch indices.

Some more information would be required to determine what could have gone wrong.

What is your system configuration?
OS version, installed GPU(s), VRAM amount, display driver version, OptiX (major.minor.micro) version, CUDA toolkit version (major.minor) used to generate the input PTX, host compiler version.

How did you allocate that stats buffer?
Where does it reside (device, pinned memory, etc.)
Are there multiple GPUs involved?

Maybe just post the exact code excerpts which create and initialize the buffer and all device code accessing the buffer.

I do cudaMemset(m_device_pixels, 0, size) before calling optixLaunch

Ubuntu 22.04, 2070, Driver Version: 520.56.06 CUDA Version: 11.8

What looks especially weird is that I tried checking the values, and when the check is active, the problem disappears, so it looks more like a race.

That’s not enough information to be able to help further.
I understand that there might be some race condition. I just can’t say why that could happen, yet, and that’s why I’m asking all these things.

Please answer all my questions (including the OptiX version number) or provide a minimal and complete reproducer in failing state, best by changing one of the OptiX SDK examples.

Is the optixLaunch using a different stream than the default?
Does the behavior change if you put a synchronization call between the cudaMemset and the optixLaunch?

My deepest apologies. The strange behavior was caused by my miscalculation that caused a local buffer overrun - it caused the weird results. Thanks again for your swift feedback!