Memory performance has declined severely with the new kernel.
Could you please advice to kernel settings that restore performance to the JetPack 4.4.1 level (4.9.140-tegra)?
Performance comparison:
sudo mount tmpfs -ttmpfs -osize=4G /mnt/
sudo dd if=/dev/zero of=/mnt/zero.bin bs=1M count=4000
- JP441: 4194304000 bytes (4.2 GB, 3.9 GiB) copied, 1.55166 s, 2.7 GB/s
- JP502: 4194304000 bytes (4,2 GB, 3,9 GiB) copied, 3,9952 s, 1,0 GB/s
sudo dd if=/mnt/zero.bin of=/dev/null bs=1M
- JP441: 4194304000 bytes (4.2 GB, 3.9 GiB) copied, 0.626467 s, 6.7 GB/s
- JP502: 4194304000 bytes (4,2 GB, 3,9 GiB) copied, 0,76411 s, 5,5 GB/s
for OPER in write read; do sysbench --threads=$(nproc) --memory-oper=${OPER} memory run; done
- JP441: 2271.58 MiB/sec / 20623.23 MiB/sec
- JP502: 1869.11 MiB/sec / 12256.02 MiB/sec
1 Like
Could you boost clocks to try.
sudo nvpmodel -m 0
sudo jetson_clocks
sudo su
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate
No change at all unfortunatly.
Could you check if the emc clock with J4.x
cat /sys/kernel/debug/bpmp/debug/clk/emc/rate
It’s lower than for JP5.
JP411
$ sudo cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate
1600000000
$ sudo cat /sys/kernel/debug/bpmp/debug/clk/emc/rate
1600000000
Could you check the /etc/nvpmode.conf to confirm the select the Max power mode like
POWER_MODEL ID=8 NAME=MODE_20W_6CORE
Yes, this is already verified to not be the culprit.
I’m running the same mode on both 4.1.1 and 5.0.2.
The 5.0.2 default configuration does not however contain all modes, so needed to add them manually.
Do you have any other suggestions?
I’ve tried running both kernels on the same rootfs, still get the poor performance on the jp5 kernel.
Everything points to a kernel problem, not a configuration problem.
Do you have any kernel settings to improve performance?
Do I have any attention here or do you ignore this?
1 Like
We try on different device get better performance on J5.0.2 than J4.x
Maybe need to get the result on the same XNX to check
Could you please send me your results? And information of how you get the result?
I’ve tried running the exact same module, and tried running over a family of 4, getting way lower performance on jp5 on all of them.
Could you please accept this issue and help working for a solution?
I again tried, this time on a brand new xavier nx devkit.
Same issue, about half the memory performance on JP502 compared to JP411.
If you have other experience you need to show some proof!
1 Like
Any luck with improving memory performance? I ran benchmarks like GitHub - luxas/benchmark: Dockerized C benchmarks for both ARM and amd64 hardware and observed worse performance on Jetpack 5.
The performance drop could come from the change in the page cache implementation. However, it is hard to roll back the change since the two kernels have big differences. Also, the page cache implementation is mostly optimized for physical storage, that can’t judge the implementation on k5.10 is problematic with this use case.
A quick solution can be using the huge page, which reduces the chance of taking the page cache lock. This gives a more stable performance between two versions of the kernel.
$ sudo mount tmpfs -ttmpfs -osize=4G -ohuge=always /mnt/
sudo dd if=/dev/zero of=/mnt/zero.bin bs=1M count=4000
- JP441: 4194304000 bytes (4.2 GB, 3.9 GiB) copied, 1.74741 s, 2.4 GB/s
- JP502: 4194304000 bytes (4.2 GB, 3.9 GiB) copied, 1.80514 s, 2.3 GB/s
sudo dd if=/mnt/zero.bin of=/dev/null bs=1M
- JP441: 4194304000 bytes (4.2 GB, 3.9 GiB) copied, 0.828048 s, 5.1 GB/s
- JP502: 4194304000 bytes (4.2 GB, 3.9 GiB) copied, 0.763015 s, 5.5 GB/s