sizeof(int3) == 12 on host while int3 in device is supposed to be aligned ?


I would like to know if the type int3, is aligned or not ?
If yes, should sizeof(int3) return 16 ? When I do a sizeof(int3) from the host code, I get 12 (3x 4 I suppose)

Do I get better performance using my own array with a proper padding like mentioned here :

I am using Cuda 7.5, Linux, x86_64

The size is correct at 12 bytes (3 integers of size 4 bytes). The alignment is 4 bytes as mentioned here;