Memcheck reports misaligned atomic at address divisible by 0x1000 !!??

The topic title says it all…
How is this possible? A 4KB object would be perfectly aligned at that address, let alone a measly 32-bit float. I don’t see how something could possibly be more aligned :-)

Configuration - GTX 1080 in a machine running Windows 10, Visual Studio 2013, CUDA 8.0, Nsight
This is a debug build of the kernel.

Here’s the output:

GPU State:
   Address  Size      Type  Mem       Block  Thread         blockIdx  threadIdx                                                                                                PC  Source
 708001000     4  mis atom    g           1       0          {1,0,0}    {0,0,0}  [bla bla function name]_9f58e5bb9atomicAddEPjj+000128  c:\program files\nvidia gpu computing toolkit\cuda\v8.0\include\device_functions.hpp:1564

Summary of access violations:
c:\program files\nvidia gpu computing toolkit\cuda\v8.0\include\device_functions.hpp(1564): error MemoryChecker: #misaligned=1  #invalidAddress=0

Memory Checker detected 1 access violations.
error = misaligned atomic (global memory)
gridid = 59
blockIdx = {1,0,0}
threadIdx = {0,0,0}
address = 0x708001000
accessSize = 4