atomicXor misaligned memory access?!

I wanted to do atomicXor builtin function on characters (bytes). However, CUDA does not support that at the moment, only ints (32bit words). My idea was to move the pointer to a byte just 3 bytes before, cast it to int and call atomicXor, i.e. atomicXor((int)b_ptr - 3).

However, that cannot work? Here is the message from cuda-memcheck:

======== Invalid global read of size 4
========= at 0x000000a8 in __iAtomicXor
========= by thread (95,0,0) in block (0,0,0)
========= Address 0x7f50c6000b73 is misaligned

What should I do? Why cannot I do that? Any ideas? I was thinking of finding the modulo 4 address then, aka calculating the “pointer % sizeof(int)” and move like that.

Thank you in advance.

On GPUs, data must be naturally aligned. This means that an N-byte data object needs to be located at an address evenly divisible by N. In your case N=4. Access to misaligned data leads to undefined behavior.

The reason for not supporting unaligned data access is to simplify the hardware. The design of GPUs focuses the transistor budget on mechanisms that perform computational work.