I wanted to do atomicXor builtin function on characters (bytes). However, CUDA does not support that at the moment, only ints (32bit words). My idea was to move the pointer to a byte just 3 bytes before, cast it to int and call atomicXor, i.e. atomicXor((int)b_ptr - 3).
However, that cannot work? Here is the message from cuda-memcheck:
======== Invalid global read of size 4
========= at 0x000000a8 in __iAtomicXor
========= by thread (95,0,0) in block (0,0,0)
========= Address 0x7f50c6000b73 is misaligned
What should I do? Why cannot I do that? Any ideas? I was thinking of finding the modulo 4 address then, aka calculating the “pointer % sizeof(int)” and move like that.
Thank you in advance.