How to measure L3 cache miss by using uncore PMU on NVIDIA Jetson AGX Xavier

According to the ‘Xavier Series SoC Technical Reference Manual’, the developer can measure the number of L3 cache misses by using the MSR/MRS instructions with uncore PMU system registers. I tried several ways, but all failed. Could you tell me how to measure it?

I’ve tried three ways.

  1. Directly assembly programming
    A compilation error occurs. Because the compiler does not recognize the system register name of the uncore PMU, such as NV_PMSELR_EL0, NV_PMCNTENSET_EL0, NV_PMCR_EL0, …

The compilation error for the following code is as follows.

static inline void uncore_pmu_select_group_unit(uint8_t group, uint8_t unit) {
    uint32_t tmp = 0; 
    tmp = group;
    tmp = tmp << 8;
    tmp = tmp | unit;
    asm volatile("msr NV_PMSELR_EL0, %0" : : "r" (tmp));
/tmp/ccpz0I4c.s: Assembler messages:
/tmp/ccpz0I4c.s:6722: Error: unknown or missing system register name at operand 1 -- `msr NV_PMSELR_EL0,x0'
  1. Uncore PMU related codes didn’t work.
    I think there are two related codes.
  1. kernel/nvidia/drivers/platform/tegra/tegra18_perf_uncore.c
    I tested those codes on Xavier but they didn’t work,
    because ‘tegra18_is_cpu_denver(cpu)’ HW validity check function returns ‘false’ value on Xavier.
    These codes seem to work only on TX2

  2. kernel/nvidia/drivers/platform/tegra/tegra19x-mce.c
    Can I use the uncore PMU with the function below?

/ * Issue a NVG request with data * /
static noinline notrace void nvg_send_req_data (uint64_t req, uint64_t data) {
asm volatile (
    "msr s3_0_c15_c1_2, %0 \n"
    "msr s3_0_c15_c1_3, %1 \n"
    :: "r" (req), "r" (data));

What is the system register named ‘s3_0_c15_c1_1’, ‘s3_0_c15_c1_2’, ‘s3_0_c15_c1_3’?
Is there the name associated with the system registers of uncore PMU? such as NV_PMSELR_EL0, NV_PMCNTENSET_EL0, NV_PMCR_EL0, …

  1. NVIDIA Nsight Systems 2018.1.3 tool
    This tool can collect various events of PMU.
    However, there is no event related to l3 cache in the available PMU event list.
    Not yet implemented on Xavier?

(1) Directly assembly programming compilation error:
Please use “s3_3_c15_c5_1” to access NV_PMSELR_EL0. Likewise for other implementation defined system registers.

(2) & (3) It is currently under development and will be available in future software releases.

(1) Thank you for your answer, sincerly.
Could you let me know the ‘dedicated register name’ for each system register of uncore PMU?

NV_PMSELR_EL0: s3_3_c15_c5_1
ID_AFR0_EL1: ?
ID_AA64AFR0_EL1: ?

(2) & (3) Could you tell me when it will be possible?

1 Like

I am also unable to find a reference in the Xavier TRM that identifies the system register names for the uncore PMU. Perhaps there is some other document that does this?

Please refer to section “ Uncore Perfmon Registers” & “Table 5.30 Uncore Perfmon Registers”.

For more information about using the counters, see the kernel documentation within source code at this path: Documentation/devicetree/bindings/platform/tegra/nvidia,carmel-pmu.txt