I have a T10P engineering sample and I’ve been trying to apply the new atomic functions to shared memory in a histogramming application. I found some discrepancies in my results. Upon further testing, I found that when I have several threads in different warps all accessing the same shared memory via the atomicAdd function only some of the adds are successful. If the threads are in the same warp there is no problem. Am I doing something wrong? Is this the expected behavior of the atomic functions? Or is this a bug in the current software or sample hardware. I’ll post a sample code which reproduces this result in a few minutes. Thank you for your time.