I’m curious about that optix::float[n] (e.g. optix::float3) have any advantage compared to float array.
optix::float[n] types force us stricter alignment (while float array requires only 4 byte alignment) so I guess there is a possibility that optix::float[n] types implicitly achieve good performance from the perspective of cache.
Is there another advantage for optix::float[n] even when these are on register?
(I don’t consider operator overloading and other built-in functions for optix::float[n] as advantage here because we can define them for a custom class including float array by ourselves.)
Yea float2 and float4 have superior memory access patterns compared to float[4] when accessing them from global memory. Fx I’ve improved performance slightly by storing vertex positions as float4 instead of float3 and then using .w to store an encoded normal when applicable.
Once they’ve been loaded into registers the thing becomes a bit more muddled and I’ve seen some slight performance drops from trying to align my own structs that essentially wrapped a float4, fx { float3 direction; float pdf }. Since these were never dumped to global memory it was apparently better to leave them unaligned. You can read about memory alignment and access patterns in the CUDA Programming Guide.
I don’t want to use optix::float[n] if possible because they are not intuitive(e.g. optix::make_floatn…) and require strict offset when they are included in a struct.
If their advantage is only due to alignment for global memory access, it is easy to stop using them by instead adjusting alignment of a struct which includes float array (or a struct wraps some floats).
In this case the float array doesn’t require troublesome offset and it should hove a good performance if the struct doesn’t stride across cache boundary.