NVIDIA® Nsight™ Compute 2020.1 is now available

NVIDIA® Nsight™ Compute 2020.1 is now available for download in the NVIDIA Registered Developer Program.

Version 2020.1 supports the CUDA Toolkit 11.0 and NVIDIA’s A100 GPU, including Asynchronous Copy to Shared Memory and Compute Data Compression enhancements to the Memory Workload Analysis. We’ve also added Roofline charts, Hot Spot tables, and cross-linking between result sections to the profiling reports. Platform support has been expanded to include Arm SBSA. And of course, there are numerous workflow and UI improvements, performance improvements, and bug fixes.

The latest NVIDIA® Nsight™ Compute 2020.1 release now offers:

General
  • Added support for the NVIDIA A100/SM 8.x GPU architecture
  • Expanded platform support to include Arm SBSA (server based system architectures)
  • Support for CUDA Toolkit 11.0 was added
  • Added a rule for reporting uncoalesced memory accesses as part of the Source Counters section
  • Added support for report name placeholders %p, %q, %i and %h
  • The Kernel Profiling Guide was added to the documentation
  • The Special Configurations sections was added to the documentation, detailing support for NVIDIA Ampere architecture’s Multi-Instance GPU (MIG)
  • Added support for Visual Studio integration (windows only)
NVIDIA Nsight Compute
  • Added support for roofline analysis charts
  • NVIDIA Ampere architecture enhancements
    • Memory Workload Analysis Report now shows Compute Data Compression ratio and amounts
    • Memory Workload Analysis Report now shows Asynchronous Copy to shared memory
  • Added linked hot spot tables in section bodies to indicate performance problems in the source code
  • Added section navigation links in rule results to quickly jump to the referenced section
  • Added a new option to select how kernel names are shown in the UI
  • Added new memory tables for the L1/TEX cache and the L2 cache. The old tables are still available for backwards compatibility and moved to a new section containing deprecated UI elements.
  • Memory tables now show the metric name as a tooltip
  • Source resolution now takes into account file properties when selecting a file from disk
  • Results in the profile report can now be filtered by NVTX range
  • The Source page now supports collapsing views even for single files
  • The UI shows profiler error messages as dismissible banners for increased visibility
  • Improved the baseline name control in the profiler report header
  • The UI command was renamed from nv-nsight-cu to ncu-ui. Old names remain for backwards compatibility.
NVIDIA Nsight Compute Command Line Interface
  • The CLI command was renamed from nv-nsight-cu-cli to ncu. Old names remain for backwards compatibility.
  • Queried metrics on GV100 and newer chips are sorted alphabetically
  • Multiple instances of NVIDIA Nsight Compute CLI can now run concurrently on the same system, e.g. for profiling individual MPI ranks. Profiled kernels are serialized across all processes using a system-wide file lock.
  • Resolved Issues
  • More C++ kernel names can be properly demangled
  • Fixed a free(): invalid pointer error when profiling applications using pytorch > 19.07
  • Fixed profiling IBM Spectrum MPI applications that require PAMI GPU hooks (–smpiargs=“-gpu”)
  • Fixed that the first kernel instruction was missed when computing sass__inst_executed_per_opcode
  • Reduced surplus DRAM write traffic created from flushing caches during kernel replay
  • The Compute Workload Analysis section shows the IMMA pipeline on GV11b GPUs
  • Profile reports now scroll properly on MacOS when using a trackpad
  • Relative output filenames for the Profile activity now use the document directory, instead of the current working directory
  • Fixed path expansion of ~ on Windows
  • Memory access information is now shown properly for RED assembly instructions on the Source page
  • Fixed that user PYTHONHOME and PYTHONPATH environment variables would be picked up by NVIDIA Nsight Compute, resulting in locale encoding issues.
Drops and Deprecations
  • Removed support for the Pascal SM 6.x GPU architecture
  • Windows 7 is not a supported host or target platform anymore

More Information

For more information on Nsight™ Compute, including features, requirements, documentation and support, please visit Nsight Compute overview page.

To learn more,

Download this version now, or get it as part of the CUDA Toolkit 11.0.

1 Like