I was trying to profile my OptiX application with Nsight Systems (later planning to move to Nsight Compute once I have identified potential bottleneck regions). I am mainly interested in comparing the memory coherency of the implementation with a reference implementation, so metrics like cache hit ratios and more information about memory accesses would be great. I am very new to both Nsight Systems and Nsight Compute, so I was wondering if you had any guidelines of where I should best start looking. In Nsight Systems I spotted the DRAM bandwidth and local/non-local resident memory rows under the GPU metrics, but I would like more detailed information on cache hits vs. cache misses.
You’ll get that information for your OptiX device code inside Nsight Compute.
More information here: https://developer.nvidia.com/nsight-compute
The Inspect Memory Workload section there shows how to read the graph.
Thank you for the quick help, I’ve been able to start exploring the very broad but useful collection of statistics that Nsight Compute offers. However, I’ve been trying to use the source code correlation feature, but Nsight Compute will only show me the SASS code. I would like to inspect CUDA source code. I have launched the interactive profiler having set Import Source to Yes. My CUDA programs are also compiled with the --generate-line-info flag. Is there anything else that I am missing? I have seen that you can add CUDA source paths to the CLI profiler via source-folders, but I can’t seem to find where to add this path via the interactive profiler. Any thoughts on this?
To profile your code with Nsight Compute, enable --generate-line-info and set debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_MODERATE in both the OptixModuleCompileOptions and OptixPipelineLinkOptions in your application host code.
Thank you! Setting debugLevel to OPTIX_COMPILE_DEBUG_LEVEL_MODERATE made it work. I read this has a slight impact on performance however, is it recommended to only set this property when debugging with tools such as Nsight Compute?