This is ~3.6 Grays/sec. Seems pretty reasonable to me for a 3060 and I guess would be hard to make it 2x faster, even if theoretically possible. It’s not bad in the sense that some of our SDK samples run lower than this on higher end GPUs. Is it a single kernel launch, or many launches over 35 seconds? Single sample per pixel, or multiple? Path tracing, or primary rays only, or something else? How big is the scene? Single level, two level, or higher?
I’d guess the peak ray limit is higher on 3060 (but I don’t know exactly what the limit is). Whether you can achieve higher depends on the details of your application, including the kernel size, payload size, memory bandwidth required, OptiX features used, renderer type, etc. To approach the theoretical limit on any GPU, you need big batch sizes, a small payload, coherent rays, simple shading, no any-hit, hardware triangles only, fast-math options enabled, as well as careful handling of any intermediate data, and the output buffer if you include any I/O in your timings.