That would need some more details about the system configurations you’ve been comparing.
That example2_pipelineAndRayGen is just filling the output buffer without shooting a single ray.
That’s effectively a two-dimensional CUDA kernel, just more complicated.
It’s more of a VRAM bandwidth and number of CUDA cores benchmark than anything else.
These examples are meant to show OptiX 7 concepts in a simple way. They are not optimized for performance.
Remember what I said about the gdt/math/vec.h classes not being suited for vectorized device memory accesses.
Also never use empty device programs in an OptiX pipeline like these examples do. Use a nullptr for the module and program name instead. Not assigning a program is faster than an empty program.
If you want to compare ray tracing performance you should look at the final examples, or use the OptiX SDK 7.1.0 optixMeshViewer for a Whitted style renderer, or use my OptiX 7 applications for path tracers which have benchmarks built-in.
Also when benchmarking anything which is measured in frames per second with display to the screen, make sure to disable vertical sync inside the NVIDIA display control panel.
Find that here:https://forums.developer.nvidia.com/t/optix-6-5-demo-performance-concern/128404/2