NVIDIA® Nsight™ Compute 2025.1 is now available for download in the NVIDIA Registered Developer Program.
Updates in 2025.1.1
-
NVIDIA Nsight Compute
- Added support for Optix 9.0 functions
optixClusterAccelComputeMemoryUsage
andoptixClusterAccelBuild
. - Fixed a possible deadlock condition while handling the launch of child processes on Linux systems.
- Fixed a possible crash of the Nsight Compute UI when switching to the Source Page.
- Fixed the missing roofline ceilings in the Floating Point Operations Roofline for GB20x chips.
Resolved Issues
Updates in 2025.1.0
- All roofline sections are now included in the
full
section set. - Range Replay and app-range replay are now supporting the collection of instruction-level source metrics.
- Rules are now supported for range replays.
- Improved which launch metrics are available for ranges.
- Added a new
launch__stack_size
metric in the Launch Statistics section to report the configured stack size. - Added a new
sass__inst_executed_register_spilling
metric which counts the number of load and store instructions that were created by the compiler due to register spilling. - Nsight Compute host GUI now natively supports macOS arm64.
- Added interactive tooltips to Details and Source pages. An interactive tooltip can be used to compare different baselines. Its content can be copied to the clipboard using the copy icon button.
- CUDA Green Contexts support is improved by showing TPC mask information in the Launch Statistics section, the Resources tool window, and on the Session page.
- Added heatmap to the Source Comparison document to visualize the source code differences.
- Added Diff By drop down menu to the Source Comparison document in the SASS view, this allows you to choose the diff basis based on either Opcode or Full Instruction.
- Performance improvements in SASS view.
- The Resources View for CUDA Graphs can now visualize the graph structure directly in a new Chart mode.
- The Memory Chart now supports zoom and pan.
- The Metric Details tool window now shows PM sampling metrics from the timeline as context switched.
- Improved the performance for deploying to target systems over remote connections.
- Fixed that on some systems, not all free GPU memory was considered when saving context memory for multi-pass data collection.
- Fixed an incorrect multiplier in the calculation of non-tensor FP16 rooflines.
- Fixed the metric
Avg. Threads Executed
for inlined functions with control flow. - Fixed that in some situations, no average was shown in the Source Statistics table for Warp Stall sampling metrics.
- Fixed several SASS syntax highlighting issues.
- Fixed an issue where the SM count wasn't shown correctly in the report header when loading older reports.
- Improved interactions between the Metric Details tool window and the memory chart.
General
NVIDIA Nsight Compute
Resolved Issues
Updates in 2024.4 (OEM/ISV only release)
- Added support for the Blackwell architecture.
- Added support for several
launch__*
metrics for CUDA graphs. - Added support for cuMemBatchDecompressAsync API in the Range Replay.
- A new feature overview is now shown the first time a new UI version is opened.
- Switched the default orientation of the Raw page to show metrics in rows and profile results in columns.
- Added support for reporting register spilling compiler annotations on the Source page.
- The source page has improved search with support for regular expression- and value-based lookups.
- Added support to set a Source View Profile as the default profile to apply it automatically while opening a report.
- Added hyperlinks for the line numbers and inline function addresses in the Inline Table. This enabled you to quickly jump to the respective line number in the Source view and address in the SASS view. Added a new column Source File in the Inline Table to show the file name to which source belongs.
- The memory chart can indicate or hide inactive elements.
- Chart tooltips on the Details page now show more relevant information when a specific value is hovered.
- Roofline charts now support showing the formula for ridge point calculation in the metric details tool window.
- The occupancy calculator now considers the impact of block barriers for Hopper-architecture and newer GPUs. It also has improved controls to adjust input values.
- The remote connections dialog now supports placeholders to deploy files to e.g. user-specific directories on the target system.
- Added new
--nvtx-push-pop-scope
command line option which allows to set push pop range scope process wide. - Fixed UI scrolling issues on macOS trackpads.
- Fixed that certain Python script errors were not properly reported when loading rule files.
- On CUDA 12.7 drivers, context switch trace can now filter events more precisely to the profiled CUDA context, even when profiling in containers.
- NVTX filtering now properly supports start/end ranges that start and end in different threads.
- Fixed several issues with Range Replay when capturing CUDA memcpy APIs.
General
NVIDIA Nsight Compute
NVIDIA Nsight Compute CLI
Resolved Issues
For a complete overview of all NVIDIA® Nsight™ Compute features and access to resources, please visit the main Nsight Compute Overview page.
Version 2025.1.1 Overview | New Features | Getting Started | Download | Documentation
Latest PRODUCT INFO
- Overview (download, features, platforms, requirements)
- Getting Started
- Documentation
- Downloads (as part of CUDA Toolkit)
- Downloads (outside of CUDA Toolkit)
- Revision History
- Videos
- News & Blogs
- Nsight Compute Forum