I would like to know if the type int3, is aligned or not ?
If yes, should sizeof(int3) return 16 ? When I do a sizeof(int3) from the host code, I get 12 (3x 4 I suppose)
Do I get better performance using my own array with a proper padding like mentioned here :
I am using Cuda 7.5, Linux, x86_64