I’m trying to see if I could get some performance counter metrics running an RTcore-accelerated OptiX application, specifically memory operations and L2 metrics.
But I’m unsure if the perf counter reports I get from the Nsight Compute tool reflects the memory accesses done from the RTcore hardware, or just the CUDA kernels run in the SMs.
Could anyone provide a clarification on this, or maybe pointers to any documentations?
For what I’ve done so far, I profiled the
optixPathTracer example app in the SDK using Nsight Compute and could see two active kernels being profiled: the ray generation kernel (
__raygen__rg_...) and something called
NVIDIA Internal. From the Nsight VSE documentation the latter seems to be one of the OptiX kernels invisible to the user.
What I’m looking for specifically is the part where the RTcore hardware does the acceleration structure traversal - namely the
optixTrace call. Can I assume that the
NVIDIA Internal part is where that happens? The
NVIDIA Internal kernel seems to be running for a much shorter time than the ray generation kernel, and that makes me confused about where the actual traversal is happening.
Thanks in advance for any help.