Profiling memory coherency of OptiX application with Nsight Systems and Nsight Compute

Hi,

I was trying to profile my OptiX application with Nsight Systems (later planning to move to Nsight Compute once I have identified potential bottleneck regions). I am mainly interested in comparing the memory coherency of the implementation with a reference implementation, so metrics like cache hit ratios and more information about memory accesses would be great. I am very new to both Nsight Systems and Nsight Compute, so I was wondering if you had any guidelines of where I should best start looking. In Nsight Systems I spotted the DRAM bandwidth and local/non-local resident memory rows under the GPU metrics, but I would like more detailed information on cache hits vs. cache misses.

Thanks in advance,

– Chuppa

You’ll get that information for your OptiX device code inside Nsight Compute.
More information here: https://developer.nvidia.com/nsight-compute
The Inspect Memory Workload section there shows how to read the graph.

1 Like

Thank you for the quick help, I’ve been able to start exploring the very broad but useful collection of statistics that Nsight Compute offers. However, I’ve been trying to use the source code correlation feature, but Nsight Compute will only show me the SASS code. I would like to inspect CUDA source code. I have launched the interactive profiler having set Import Source to Yes. My CUDA programs are also compiled with the --generate-line-info flag. Is there anything else that I am missing? I have seen that you can add CUDA source paths to the CLI profiler via source-folders, but I can’t seem to find where to add this path via the interactive profiler. Any thoughts on this?

Did you follow this advice in the OptiX 7.6.0 Programming Guide about Nsight Compute?

https://raytracing-docs.nvidia.com/optix7/guide/index.html#program_pipeline_creation#7017

To profile your code with Nsight Compute, enable --generate-line-info and set debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_MODERATE in both the OptixModuleCompileOptions and OptixPipelineLinkOptions in your application host code.

Did you compile to PTX or OptiX IR input code?

Which version of Nsight Compute are you using?

1 Like

Thank you! Setting debugLevel to OPTIX_COMPILE_DEBUG_LEVEL_MODERATE made it work. I read this has a slight impact on performance however, is it recommended to only set this property when debugging with tools such as Nsight Compute?

Yes, for full performance release mode, use optLevel full optimizations and debugLevel none or minimal debug information.

Enums here:
OptixCompileOptimizationLevel
OptixCompileDebugLevel

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.