Timing rtTrace via NVAPI

I just came across this blog post:

It is showing timing capabilities per pixel in DXR using Nvidias NVAPI.

My question: is this available in general CUDA code too and in Optix especially? This would be a nice debugging/profiling tool. If so, how can it be done? As far as I understood the NVAPI documentation the functions described in the post exist in DXR extensions only so not readily available in CUDA/Optix.

Thanks and kind regards

Yes, you can simply use the CUDA clock() instruction for that and scale it accordingly.

Please find more information and links to example code in this post where I mentioned that blog post as well:

Actually the OptiX SDKs before 7.0.0 also had that feature inside the pinhole_camera.cu example. Search for TIME_VIEW.

Thank you! I assumed there is something similarly easy but could not find it directly.