[Fixed] perf_event_open() with cpu-wide accounting fails

Hi,

I’m bearing down on a perf issue that I hope to avoid debugging kernel code to solve.

cat /etc/nv_tegra_release | head -n 1
# R24 (release), REVISION: 2.1, GCID: 8028265, BOARD: t210ref, EABI: aarch64, DATE: Thu Nov 10 03:51:59 UTC 2016
uname -a
Linux tegra-ubuntu 3.10.96-tegra #1 SMP PREEMPT Wed Nov 9 19:42:57 PST 2016 aarch64 aarch64 aarch64 GNU/Linux

perf_event_open doc: http://man7.org/linux/man-pages/man2/perf_event_open.2.html

To summarise: events opened (perf_event_open) with cpu=0, pid=-1 (to measure a performance counter for all PIDs on a CPU), return bogus data:

[INFO]	[PERF]	Parsing event id INSTR
[INFO]	[PERF]	ID 0x1
[INFO]	[PERF]	Opened for CPU 0 fd 43
[INFO]	[PERF]	Read 8 chars 0x7fffffff (2147483647) fd 43
[INFO]	[PERF]	Read 8 chars 0x0 (0) fd 43
[INFO]	[PERF]	Read 8 chars 0x0 (0) fd 43

while doing exactly the same, but for the calling process on all cpus (cpu=-1, pid=0), produces correct results:

[INFO]	[PERF]	Parsing event id INSTR
[INFO]	[PERF]	ID 0x1
[INFO]	[PERF]	Opened for CPU 0 fd 43
[INFO]	[PERF]	Read 8 chars 0xb67 (2919) fd 43
[INFO]	[PERF]	Read 8 chars 0x6f4 (1780) fd 43
[INFO]	[PERF]	Read 8 chars 0x7c3 (1987) fd 43
[INFO]	[PERF]	Read 8 chars 0x7f9 (2041) fd 43

I am already aware of some (possibly related) issues with perf and idle state power management on the CPU:

  • https://devtalk.nvidia.com/default/topic/955554/jetson-tx1/performance-counters-reset-itself/post/4963456/#4963456
  • https://devtalk.nvidia.com/default/topic/975769/collecting-cpu-performance-counters-for-tx1/

I also checked that the perf_event_paranoid value is -1 and also tried CAP_SYS_ADMIN.

Now stuck compiling kernels wildly looking for clues… any ideas?

Some excerpts from my code:

Opening perf counters work fine:

pattr = data->pattr[data->num_fds];

            pattr->type = PERF_TYPE_RAW;
            pattr->size = sizeof(struct perf_event_attr);

            pattr->config = (uint64_t) id; // Set to cpu specifi event

            pattr->sample_type = PERF_SAMPLE_READ;
            pattr->sample_type = 0;
            pattr->read_format = 0;
            pattr->disabled = 0; // event count starts at once
            pattr->inherit = 1;
            pattr->enable_on_exec = 1;
            pattr->pinned = 1; // always count; reads return eof when not running (read return 0)



            ret = perf_event_open(pattr, -1, 0, data->group_fds[cpu], 0); // (pattr, pid, cpu, group, flags)
            if(-1 == ret)
            {
                ERROR("event open %d (%s) group %d", cpu, strerror(errno), data->group_fds[cpu]);
            }

…reading them also fine without any issues (no error flags, everything looks fine programatically).

Can supply some simplified sources if required…

Hi,

Thanks for your feedback.

We have new OS now, could you check if this issue also occurs in Rel-28.1 branch?

Thanks.

Hi AastaLLL, :)

Yes it’s exactly the same.

cat /etc/nv_tegra_release 
# R28 (release), REVISION: 1.0, GCID: 9379712, BOARD: t210ref, EABI: aarch64, DATE: Thu Jul 20 07:45:59 UTC 2017
uname -a
Linux tegra-ubuntu 4.4.38-tegra #1 SMP PREEMPT Thu Jul 20 00:41:06 PDT 2017 aarch64 aarch64 aarch64 GNU/Linux

Hi again AaastaLLL,

I just put this code up on my server… http://krisrst.no/bug_report.tar.gz

You’ll have to excuse the messy code, but it compiles as is on TX1 and illustrates the issue

Thanks.

We are checking this issue. Will update more information later.

Hi,

We tried to reproduce this issue on our side but find the results look correctly.
Could you also check it?

nvidia@tegra-ubuntu:~/bug_report$ sudo ./main 
[sudo] password for nvidia: 
cfg perf_poll:CYCLES:0x11
Parsing event id CYCLES
ID 0x11
Opened for CPU 0 fd 3
poll test
Read 8 chars 0x9531 (38193) fd 3
poll test
Read 8 chars 0x111350 (1119056) fd 3
poll test
Read 8 chars 0x8634d (549709) fd 3
poll test
Read 8 chars 0x901bd (590269) fd 3
poll test
Read 8 chars 0xda219 (893465) fd 3
poll test
Read 8 chars 0x82f87 (536455) fd 3
poll test
Read 8 chars 0x5f0f9 (389369) fd 3
poll test
Read 8 chars 0xaf6f8 (718584) fd 3
poll test
Read 8 chars 0xbbdbc (769468) fd 3
poll test
Read 8 chars 0x4b547 (308551) fd 3

Weird. It’s not working like that at all when running on my own devkit.

Maybe something with a bootup script I’m running is causing the issue. I’m setting number of cores, frequencies etc.

Will check tonight.

I’m getting the same results as before, even after a fresh reboot.

Parsing event id CYCLES
ID 0x11
Opened for CPU 0 fd 3
poll test
Read 8 chars 0x12ea9 (77481) fd 3
poll test
Read 8 chars 0x7ffe8c8b (2147388555) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3

Could there be some hardware differences here?

Have now also tested this on a different Tegra X1; with the same results. The kits I have access to were quite early revisions; for one fact I know that the Tegra X1 module itself does not have the INA rail power measurement sensors found in more recent versions.

Hi,

You are correct.

Looks like perf work correct on TX2 but fail on TX1. Although both are rel-28.1.
We will check this further and update information to you.

Thanks.

Hi,

We also tried this sample on the x86 machine, and also got abnormal data.
It looks like this issue is not relevant to Jetson platform but certain hardware situation.

Could you help us checking this?

Results of our x86 machine

vyu@vyu-server:~/bug_report$ sudo ./main 
cfg perf_poll:CYCLES:0x11
Parsing event id CYCLES
ID 0x11
Opened for CPU 0 fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3
poll test
Read 8 chars 0x0 (0) fd 3

That occurs because the HPC in use is a raw counter (0x11). It exists only for ARM Cortex-A57. Intel expose different raw hardware counters, with different hardware IDs.

If you apply this diff

114c114
<             pattr->type = PERF_TYPE_RAW;
---
>             pattr->type = PERF_TYPE_HARDWARE;
117c117
<             pattr->config = (uint64_t) id; // Set to cpu specifi event
---
>             pattr->config = (uint64_t) PERF_COUNT_HW_CPU_CYCLES; // Set to cpu specifi event

You get

sudo ./main 
cfg perf_poll:CYCLES:0x11
Parsing event id CYCLES
ID 0x11
Opened for CPU 0 fd 3
poll test
Read 8 chars 0x9bc0 (39872) fd 3
poll test
Read 8 chars 0x63fa41 (6552129) fd 3
poll test
Read 8 chars 0x6198f4 (6396148) fd 3
poll test
Read 8 chars 0x3948b9 (3754169) fd 3
poll test
Read 8 chars 0x4bd9b7 (4970935) fd 3
poll test
Read 8 chars 0x3e9841 (4102209) fd 3
poll test
Read 8 chars 0x8f95b3 (9409971) fd 3
poll test
Read 8 chars 0x3bef59 (3927897) fd 3
poll test
Read 8 chars 0x59208b (5841035) fd 3
poll test
Read 8 chars 0x5277bd (5404605) fd 3

which is working as intended (on intel architecture).

Something is definitely up with the Linux perf subsystem on the X1.

Hi,

Sorry about the false alarm. We are really not familar with perf. : (
We are discussing this issue internally and will update information to you.

Hi,

Sorry for the late reply.

Here is some information about building kernel:

Source: https://developer.nvidia.com/embedded/dlc/l4t-sources-28-1
Document: https://developer.nvidia.com/embedded/dlc/l4t-documentation-28-1
Video: http://www.jetsonhacks.com/2017/03/25/build-kernel-and-modules-nvidia-jetson-tx2/

Thanks and sorry for the inconvenience.

Sorry; I don’t really understand. Did you get a kernel quick-fix going in the latest L4T for the Tegra K1?

Hi,

Sorry for the confusion.
We have feedbacked this issue and will prioritize internally.

Since our kernel source is public, it’s also recommended to take a look.
Please find comment #14 for the Information of kernel.

Thanks and sorry for the inconvenience.

Hi,

T210 has CPUIDLE enabled, as a result, idling cpu does not give right result.Please check this thread

You can disable CPUIDLE and rerun the test.
Also in present test, if you remove or reduce the usleep to lesser value, you will get right results.

thanks
Bibek

Aha, so that was the problem. Will try rebuilding kernel without cpuidle drivers as per the post in that thread you refer to.

Just an update; works like a charm. Thanks a lot, this helps me out a great bit.