I’m looking at the same thing with RGB textures as well.
The docs specifically mention INPUT OUTPUT buffers when mentioning the 16 byte alignment, which is going to be something that is synced back and forth from the device to the host hopefully 10-30 times a second, and so must experience the fastest read and write possible.
However would we see a big loss during the first sync while we transfer all of these INPUT only buffers of points and textures? I tested with a 1 million triangle mesh and the first sync didn’t seem to take much longer overall.
It seems that if possible a 25% less data ( float3 vs float4 ) would be worth it for memory usage with the potential for longer times before fist pixel and during sync.