Performance problem with Linux 4.16.11 / Ubuntu 18.04 on Tegra TK1

We’re trying to use Linux 4.16.11 and Ubuntu 18.04 on Tegra TK1s and running into a performance problem.

I compiled 4.16.11 from kernel.org sources using arch/arm/configs/tegra_defconfig + additional options. I’m using device tree tegra124-jetson-tk1.dts from the kernel sources.

I used a raspberry pi ubuntu 18.04 base image.

Here is the performance problem. The following loop takes twice as long with 4.16.11 as it does when using the 3.10.40 kernel.

float a;
            float * b;
            int actsz = 700000;
            for (int i = 0; i < actsz; i++) {
                a += b[i];
            }

I tried a number of kernels from 4.1.1 to 4.16.11 and they all exhibit the same problem.

Is there a way to verify what clock speed is being used for DRAM and to verify that the L2 cache is enabled and configured properly?

It’s out of the scope of our support for tk1.
As you know, there are many changes between k4.1 and k3.10. If this is an issue that can be reproduced on k3.1, we could help.

You need to

modprobe tegra124-cpufreq

and add that module to /etc/modules.

Otherwise it’ll run at 1/3 speed.

(reference: https://devtalk.nvidia.com/default/topic/952215/jetson-tk1/debian-on-jetson-tk1/)

I configured tegra124 cpufreq into the kernel and used cpufreq-set to use the performance governor so that the CPU runs at max speed.

cpufreq reports the CPU is running at the right speed, but my performance test indicates it is only getting half the memory bandwidth that we get with the 3.10.40 kernel.