OPTIX_EXCEPTION_CODE_TRAVERSAL_INVALID_HIT_SBT root cause?

After implementing your idea for curves, from my older post I recognized @dhart’s answer, where he pointed out, that keeping that vertex buffer causes the data to be twice in memory. So using the vertex functions have a real advantage related to the memory usage.
Especially when realizing that the thickness data is stored in a separate buffer, which also needs to be present, keeping all that buffers additionally is costly.
However, when using temporal denoiser I still need both buffers to calculate the difference between current curve hitpoint and previous hitpoint; but in case the temporal denoiser is not used, calling the vertex functions saves that memory.

So the only difference would be the OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS flag, about you said

Is there a general assumption how much bigger they are? Would that about compensate the vertex / thickness / index data?

not using the vertex functions give me a slightly better speed, but in my test (no temporal denoiser) I get exactly the same memory sizes:

not using OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS flag:
(using vertex / index / thickness buffers directly for calculating normals)

accel_options.buildFlags= OPTIX_BUILD_FLAG_ALLOW_COMPACTION | OPTIX_BUILD_FLAG_PREFER_FAST_TRACE
accel_options.operation= OPTIX_BUILD_OPERATION_BUILD
curveType =9476   (OPTIX_PRIMITIVE_TYPE_ROUND_CATMULLROM)
g->curve_primitive_count=749446
compacted_gas_size=224422500
gas_buffer_sizes.outputSizeInBytes=256241764
=> select compacted
cudaMemGetInfo => free_gpu_mem=21596800h (~533.6mb)
Frame Time=  70msec - 72msec

using OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS flag:
(using optix vertex functions for calculating normals)

accel_options.buildFlags= OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS | OPTIX_BUILD_FLAG_ALLOW_COMPACTION | OPTIX_BUILD_FLAG_PREFER_FAST_TRACE
accel_options.operation= OPTIX_BUILD_OPERATION_BUILD
curveType =9476   (OPTIX_PRIMITIVE_TYPE_ROUND_CATMULLROM)
g->curve_primitive_count=749446
compacted_gas_size=224422500
gas_buffer_sizes.outputSizeInBytes=256241764
=> select compacted
cudaMemGetInfo => free_gpu_mem=22796800h (~551.6mb) 
Frame Time=  72msec - 74msec

.

when using the temporal denoiser (incl flow vector calculation for cuves, albedo and normals):

 no vertex functions for calculating normals: 152msec - 156msec per frame
  (free_gpu_mem=1e796800h (~487.6mb)  )
vertex functions for calculating normals: 157msec - 160msec per frame 
  (free_gpu_mem=1e796800h (~487.6mb)  )

frame times may be compromised by other applications.

I did these tests several times and noticed, that in some early tests I had a buildflag mismatch: OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS was not set in the OptixBuiltinISOptions.buildFlags, but is was set in OptixAccelBuildOptions.buildFlags; no valdiation error occured, no different output.

all still on driver 531.79

since the temporal denoiser will be in use nearly always, I simply use the direct access without the vertex functions.