Is it possible for misalignment of shared memory?
I copy a uint4 from register to shared memory. But cuda-memcheck tell me it is misaligned.
I print the address of shared memory, and it is misaligned: 0x1000308
So two question:
is it possible to make shared memory aligned?
Does uint4 instruction can be applied to shared memory? ( I print the ptx code, it seems copy 4 values 4 times, from performance aspect is it faster?)