Optix PathTracer Sample: what it the purpose of pad in g_vertices

In Optix PathTracer Sample the geometry is given with :

const static std::array<Vertex, TRIANGLE_COUNT * 3> g_vertices =
{ {
		// Floor  -- white lambert
		{    0.0f,    0.0f,    0.0f, 0.0f },
		{    0.0f,    0.0f,  559.2f, 0.0f },
		{  556.0f,    0.0f,  559.2f, 0.0f },
		{    0.0f,    0.0f,    0.0f, 0.0f },
		{  556.0f,    0.0f,  559.2f, 0.0f },
		{  556.0f,    0.0f,    0.0f, 0.0f },

		// Ceiling -- white lambert
		{    0.0f,  548.8f,    0.0f, 0.0f },
		{  556.0f,  548.8f,    0.0f, 0.0f },
		{  556.0f,  548.8f,  559.2f, 0.0f },

		{    0.0f,  548.8f,    0.0f, 0.0f },
		{  556.0f,  548.8f,  559.2f, 0.0f },
		{    0.0f,  548.8f,  559.2f, 0.0f },
...
...
		// Ceiling light -- emmissive
		{  343.0f,  548.6f,  227.0f, 0.0f },
		{  213.0f,  548.6f,  227.0f, 0.0f },
		{  213.0f,  548.6f,  332.0f, 0.0f },

		{  343.0f,  548.6f,  227.0f, 0.0f },
		{  213.0f,  548.6f,  332.0f, 0.0f },
		{  343.0f,  548.6f,  332.0f, 0.0f }
	} };

What is the purpose of pad in the Vertex Struct?

struct Vertex
{
	float x, y, z, pad;
};

I suppose (I might be totally wrong) that the pad is used to apply the transformation of the mesh for each instance of the scene. In this case the xform could be a 4x4 matrix and in order to multiply it with the g_vertices the pad column is added. In this way the g_vertices is converted to a homogeneous matrix.

If this is the reason of pad, shouldn’t it be 1.0f instead of 0.0f?

The pad element is just there to make the Vertex structure 16 bytes in size. Same for the IndexedTriangle structure.
In either case the pad element values are never used inside the optixPathTracer and it’s irrelevant what value is inside them.
But yes, when thinking of homogeneous positions, a 1.0f value for a .w component would have been clearer.

During the AS build the triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3; though the vertexStrideInBytes = sizeof( Vertex ); which is 16 here.

These structure sizes would be required for CUDA vectorized loads and stores which must be aligned to the vector size’s memory addresses or you’ll get CUDA misaligned address errors.
See this table inside the CUDA programming manual:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vector-types-alignment-requirements-in-device-code

That vectorized load is actually happening inside the device code of the optixPathTracer!
If you look at the HitGroupData structure, that defines the vertex data as float4* vertices; which will be loaded vectorized in the hit program and then only the xyz-components are used by this code:

    const float3 v0   = make_float3( rt_data->vertices[ vert_idx_offset+0 ] );
    const float3 v1   = make_float3( rt_data->vertices[ vert_idx_offset+1 ] );
    const float3 v2   = make_float3( rt_data->vertices[ vert_idx_offset+2 ] );

Loading a float4 is faster than loading a float3 because latter is loaded as 3 scalars.
There are only vectorized load instructions for 2- and 4-component vectors!

I think the optixPathTracer code is flawed in that respect.
When defining own structures which should be aligned to specific addresses, which is not guaranteed in the given structs here, then there should be an explicit alignment instruction on these structures like struct __align__(16) Vertex.

The code works nonetheless because the cudaMalloc allocating the d_vertices device pointer guarantees to return an address which is at least 256 bytes aligned.

Personally I wouldn’t have defined these structures at all but used float4 and uint4 vector types directly. (But that might make the initialization more convoluted.) Anyways, I would not recommend to copy that example code as it is.

If you’re concerned about memory usage and not so much about memory access performance you could use float3 and uint3 instead and adjust the code for that.