CPU core metrics do not match selected options

Hi,

I’m using Nsys (Nsight Systems version 2024.1.1.59-241133802077v) on a Linux target.

When I try to specify the specific CPU metric ID’s (that I obtained from --cpu-core-metrics=help) to Nsys, I see a different set of metrics selected than the ones I specified. What could I be missing?

Here’s a sample output:

$ nsys profile \
--cpu-core-metrics=10,14,26,30 \
--cuda-um-cpu-page-faults=true \
--cuda-um-gpu-page-faults=true \
ls
ERROR: Too many CPU core events are selected for sampling.
The maximum number of events that can be sampled concurrently is 6. 8 events are selected (2,78,80,87,88,93,94,95).
Events are selected using the --cpu-core-events and/or --cpu-core-metrics switches.

usage: nsys profile [<args>] [application] [<application args>]
Try 'nsys profile --help' for more information.

the --help is giving you all of the available metrics for your CPU, however the CPU doesn’t have enough registers to actually collect all of them at the same time.

@lmonis please comment if you have specific suggestions.

Hi @rajeshshashikumar,

CPU core metrics are derived from samples of one or more CPU core events. When one selects CPU core metrics using the --cpu-core-metrics option, nsys implicitly selects the corresponding CPU core events that are needed to compute those metrics.

In your case, the selection of 4 CPU core metrics configures nsys to sample 8 CPU core events. As you see in the error message, depending on the target system, there is an upper limit to the number of CPU core events that can be sampled together in a single profiling session. The 8 event IDs that are displayed in the error message can be looked up using the --cpu-core-events=help option.

Hope this helps.

Thank you for the response @lmonis. I understand from the user manual that there are limitations on the number of events that can be concurrently sampled.

But my concern here is that none of the event IDs that nsys reported in the list of 8, match those that I specified in my run command --cpu-core-metrics=10,14,26,30

I was also not able to spot the events corresponding to these identifiers (10, 14, 26, 30) when I loaded the nsys-rep in the GUI.

For context, I am using a Grace Hopper machine.

But my concern here is that none of the event IDs that nsys reported in the list of 8, match those that I specified in my run command --cpu-core-metrics=10,14,26,30

The IDs you passed to --cpu-core-metrics are metric IDs, whereas the 8 IDs in the list are event IDs.

Nsight Systems samples CPU core events, and computes CPU core metrics using one or more of the sampled events.

If you go through the output of nsys with options --cpu-core-metrics=help and --cpu-core-events=help, and look at the formula for each of the 4 metrics you had previously selected, you will notice that:

  • Metric with ID 10 is computed using samples of events with IDs 87 and 88.
  • Metric with ID 14 is computed using samples of events with IDs 87, 93, 94 and 95.
  • Metric with ID 26 is computed using samples of events with IDs 2 and 78.
  • Metric with ID 30 is computed using samples of events with IDs 2 and 80.

I was also not able to spot the events corresponding to these identifiers (10, 14, 26, 30) when I loaded the nsys-rep in the GUI.

As shown above, computing these 4 metrics requires sampling 8 distinct events. Since the upper limit is on the number of events (and not metrics), which on your system is 6, nsys fails to configure the profiling session to collect any of these 4 metrics. This is why you don’t see them in the GUI.

Please consider changing/reducing the number of metrics that you wish to collect such that the corresponding number of events that are required to be sampled is less than or equal to 6.

Hi @lmonis ,

I should have clarified this in my previous response that the issue with viewing specific cpu metrics even when number of events is less than 6.

For instance, in the example report attached, I try to specify only that the Store Percentage metric be captured with nsys profile --cpu-core-metrics=10 ./stream_c.exe

However, in the report attached I am unable to see “Store Percentage” in the timeline view upon expansion of all fields. What could I be missing?

Hi @rajeshshashikumar,

Thanks for clarifying.

Please try expanding the timeline row labeled CPU (72); you should find what you’re looking for nested under that.

@lmonis, I only see CPU utilization when I expand the “CPU (72)” row which does not correspond to the metric (10) I specified.

@rajeshshashikumar This looks like a problem. Can you show us the output of nsys status --environment?

@lmonis, here’s the output:

$ nsys status --environment
Timestamp counter supported: Yes

CPU Profiling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 0
Linux Distribution = rocky
Linux Kernel Version = 5.14.0-362.24.1.el9_3.aarch64+64k: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Not Available
Kernel module: Not Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK

See the product documentation at https://docs.nvidia.com/nsight-systems for more information,
including information on how to set the Linux Kernel Paranoid Level.

@rajeshshashikumar Thank you for the output.

A few questions to help us resolve your issue:

  • Are you running nsys inside a container? If yes, does the container have the necessary permissions to read performance counters?
  • Can you share the output of perf list in a .txt file?
  • Can you share the .nsys-rep file?

Thanks!

@lmonis

  • Are you running nsys inside a container? If yes, does the container have the necessary permissions to read performance counters?
    • No, these are not inside a container
  • Can you share the output of perf list in a .txt file?
  • Can you share the .nsys-rep file?

Hi @rajeshshashikumar,

Thank you for sharing the requested information.

Please try adding the option --event-sample=system-wide to your nsys command.

The 2024.1.1 version of nsys does not warn the user about a missing --event-sample option, which is required for the collection of CPU core events. This has been fixed on newer versions of nsys.

This should fix the issue you’ve been having. Please let us know if the problem persists.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.