Is there a way to measure RT Core util?

zzZ9527 · February 10, 2021, 8:46am

Hi,

May I know if there is a way to measure the RT Core util? like how many or how much percentage of RT Core is used when running OptiX application. I searched through Google but did not find any useful information…

Thank you in advanced!

droettger · February 10, 2021, 9:15am

Not really. You should be able to profile your own device code with Nsight Compute, but the RT core functionality is included in the results and cannot be singled out.

It’s also not possible to not use them: https://forums.developer.nvidia.com/t/leveraging-rtx-hardware-capabilities-with-optix-7-0/107733

zzZ9527 · February 10, 2021, 7:33pm

Thank you for your reply! I tried Nsight Compute to profile the sample code optixTriangle and had some further questions.

Does the SM Utilization include RT Core utilization information?
If not, what kind of information should I expect from the result page, and where can I find it?
Sorry, I am pretty new to Nsight Compute and OptiX, maybe the questions are trivial…

zzZ9527 · February 10, 2021, 7:35pm

The Nsight Compute result page for optixTriangle is here.

droettger · February 11, 2021, 8:33am

Nsight Compute will not report RT core usage. Again, you cannot profile the RT cores themselves because you cannot actually program that part. All the OptiX device code you programmed is running on the streaming multiprocessors and that’s what you can see and optimize using Nsight Compute information.

You should look into the Nsight Compute source code view which will show you where it sampled the compute kernel events.
When your OptiX PTX input source code has been compiled with line information and both OptixModuleCompileOptions and OptixPipelineLinkOptions have the debugLevel set to OPTIX_COMPILE_DEBUG_LEVEL_LINEINFO like described here:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/src/Device.cpp#L543
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/src/Device.cpp#L796
then you can see function names and the connection between CUDA source code lines and PTX instructions generated from that.
Important: Note that the OptixModuleCompileOptions optLevel has been left on full optimization. Never profile in debug mode! That will not generate the same PTX code.

Nsight Compute allows collapsing the source code view by function name which will give you an overview which of your device functions took how many of the sampled events. That is the view where you need to start analyzing your own device code for performance.

Again this all shows your streaming multiprocessor code only. Nothing what happens on the RT cores.
You might want to look at the sampled events after an optixTrace() call. If that is showing a lot of events, that would be the streaming multiprocessors waiting on the RT cores.

What you really need to concentrate on is the performance of your device code. The RT cores are able to handle >10 GRays/sec on the high-end boards. You normally reach that only for very simple cases. It’s not a theoretical number.

The limiting factor is what you do inside your device code running on streaming multiprocessors, where memory accesses are normally the bottleneck. Mind that for 10 GRays/sec with a memory bandwidth of 670 GBytes/sec you could only read or write 67 bytes per ray and some of that happens during BVH traversal and ray-triangle intersections already. Concentrate on that and you will gain speed.

Topic		Replies	Views
OptiX profiling? Nsight Compute cuda , optix	8	1093	November 27, 2023
OptiX and Performance Counter reports in Nsight Compute OptiX	5	914	June 14, 2022
Nsight Compute - GPU hardware metrics or profiling for RT Core usage? Nsight Graphics vulkan-raytracing	4	1032	May 27, 2023
How can I get the utilization of cuda core and tensor core respectively? Profiling Linux Targets	5	3438	January 10, 2023
Nsight Compute to measure metrics data Nsight Compute	1	534	January 29, 2021
Nsight Compute: optixTrace Metrics OptiX	5	627	July 5, 2023
Nsight Compute + Optix 8 / Unsupported multi-level instancing detected for traversable handle OptiX	8	95	August 26, 2025
GPU usage in OptiX7 OptiX	4	610	June 14, 2022
How to measure Tensor core utilization using NVIDIA profiling tools such as Nsight System, DLProf, nvprof etc TensorRT cudnn	4	1827	January 31, 2024
Nsight Compute OptiX 7.7 OptiX nsight , optix	3	38	August 12, 2025

Is there a way to measure RT Core util?

Related topics