False Positive on Bounds Check for Warp Illegal Address?

I have a block of data of type long I use to store my octree. Each value can either be the address of another node in the tree or a leaf with two floats. After I obtain the location of the leaf, I then want to access these two floats, specifically to modify them using atomic add. This is how I am doing so:

long * data = ....
long index = ....
float * float_data = (float *) data;
atomicAdd(&float_data[2*index], 1.0f);

However, when I do this, I get a CUDA Exception: Warp Illegal Address that doesn’t really make sense.

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x5555578c9d88

Thread 1 "raycaster" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 1, grid 2, block (48,131,0), thread (0,0,0), device 0, sm 0, warp 12, lane 0]
0x00005555578c9d90 in __fAtomicAdd ()
(cuda-gdb) bt
#0  0x00005555578c9d90 in __fAtomicAdd ()
#1  0x00005555578ba718 in _INTERNAL_49_tmpxft_0000cb48_00000000_7_cuda_raycaster_cpp1_ii_546d2a9d::atomicAdd (address=<optimized out>, val=<optimized out>) at /opt/cuda/include/sm_20_atomic_functions.hpp:77
#2  0x00005555578fbf10 in octree::map_projection_kernel<<<(180,320,1),(8,8,1)>>> (cfg=0x7fffba800200, tree_cfg=0x7fffba800000, octree_data=0x7fffa8000000, score=0x7fffbe600000)
    at /home/amai/octree/src/cuda_raycaster.cu:59
(cuda-gdb) frame 2
#2  0x00005555578fbf10 in octree::map_projection_kernel<<<(180,320,1),(8,8,1)>>> (cfg=0x7fffba800200, tree_cfg=0x7fffba800000, octree_data=0x7fffa8000000, score=0x7fffbe600000)
    at /home/amai/octree/src/cuda_raycaster.cu:59
59	   atomicAdd(&float_data[2*hit+1], 1.0f);
(cuda-gdb) p hit
$1 = 5916399
(cuda-gdb) p float_data+(2*hit+1)
$2 = (@generic float * @register) 0x7fffaad2377c
(cuda-gdb) p float_data[2*hit+1]
$3 = 1
(cuda-gdb) p float_data[2*hit]
$4 = 1.40129846e-45

The size of the original long data is 38716640, so this address shouldn’t be out of bounds. Plugging in the values that CUDA is complaining about shows that the values make sense. The first value is some floating point score and the second is a counter. (I know it could have been an int32, but I don’t have that many values so it’s not a problem). Why is CUDA throwing an exception?

Does this have something to do with address spaces?

The issue was not actually being shown correctly by CUDA GDB. I needed to perform a check on the index I got back to make sure it was valid and I forgot to, but CUDA GDB threw me off.