Profiling memory coherency of OptiX application with Nsight Systems and Nsight Compute

Chuppa · March 30, 2023, 9:47am

Hi,

I was trying to profile my OptiX application with Nsight Systems (later planning to move to Nsight Compute once I have identified potential bottleneck regions). I am mainly interested in comparing the memory coherency of the implementation with a reference implementation, so metrics like cache hit ratios and more information about memory accesses would be great. I am very new to both Nsight Systems and Nsight Compute, so I was wondering if you had any guidelines of where I should best start looking. In Nsight Systems I spotted the DRAM bandwidth and local/non-local resident memory rows under the GPU metrics, but I would like more detailed information on cache hits vs. cache misses.

Thanks in advance,

– Chuppa

droettger · March 30, 2023, 10:06am

You’ll get that information for your OptiX device code inside Nsight Compute.
More information here: https://developer.nvidia.com/nsight-compute
The Inspect Memory Workload section there shows how to read the graph.

Chuppa · March 30, 2023, 4:03pm

Thank you for the quick help, I’ve been able to start exploring the very broad but useful collection of statistics that Nsight Compute offers. However, I’ve been trying to use the source code correlation feature, but Nsight Compute will only show me the SASS code. I would like to inspect CUDA source code. I have launched the interactive profiler having set Import Source to Yes. My CUDA programs are also compiled with the --generate-line-info flag. Is there anything else that I am missing? I have seen that you can add CUDA source paths to the CLI profiler via source-folders, but I can’t seem to find where to add this path via the interactive profiler. Any thoughts on this?

droettger · March 30, 2023, 4:27pm

Did you follow this advice in the OptiX 7.6.0 Programming Guide about Nsight Compute?

https://raytracing-docs.nvidia.com/optix7/guide/index.html#program_pipeline_creation#7017

To profile your code with Nsight Compute, enable --generate-line-info and set debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_MODERATE in both the OptixModuleCompileOptions and OptixPipelineLinkOptions in your application host code.

Did you compile to PTX or OptiX IR input code?

Which version of Nsight Compute are you using?

Chuppa · March 30, 2023, 5:01pm

Thank you! Setting debugLevel to OPTIX_COMPILE_DEBUG_LEVEL_MODERATE made it work. I read this has a slight impact on performance however, is it recommended to only set this property when debugging with tools such as Nsight Compute?

droettger · March 30, 2023, 5:08pm

Yes, for full performance release mode, use optLevel full optimizations and debugLevel none or minimal debug information.

Enums here:
OptixCompileOptimizationLevel
OptixCompileDebugLevel

system · April 13, 2023, 5:08pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need help profiling an optix application OptiX	6	964	September 12, 2023
OptiX profiling? Nsight Compute cuda , optix	8	1056	November 27, 2023
Optix profiling using Nsight OptiX	10	2187	June 14, 2022
Nsight Compute: discrepancy in cache reports for OptiX applications Nsight Compute	8	611	July 13, 2021
Nsight Compute reports error calling optixAccelComputeMemoryUsage OptiX nsight	7	995	June 14, 2022
Way to debug PyOptiX applications OptiX debugger	5	263	June 10, 2024
OptiX debugging with Nsight VSE OptiX	9	1351	June 14, 2022
Latest Nsight Systems and Nvidia Driver aren't compatible? Profiling x86 Windows Targets	21	3662	March 4, 2021
Troubleshooting Nsight Compute Display Issues with OptiX 7.1 Profiling Nsight Compute optix	9	241	July 10, 2024
Nsight Compute: optixTrace Metrics OptiX	5	594	July 5, 2023

Profiling memory coherency of OptiX application with Nsight Systems and Nsight Compute

Related topics