Bad Jetpack5 nvme writing performance compared with Jetpack4

diogojusten · October 31, 2023, 9:31am

Hi folks,
I migrated to Jetpack 5.1.2 where it has kernel 5.10.104 and the nvme writing performance is very bad compared with previous Jetpack 4 with kernel 4.9.253. It is specifically very slow when writing into nvme.

JP4 Linux 4.9.253
# dd if=/dev/zero of=/media/nvme/output bs=8k count=10k
10240+0 records in
10240+0 records out
83886080 bytes (84 MB, 80 MiB) copied, 0.0733814 s, 1.1 GB/s

JP5 Linux 5.10.104
# dd if=/dev/zero of=/media/nvme/output bs=8k count=10k
10240+0 records in
10240+0 records out
83886080 bytes (84 MB, 80 MiB) copied, 0.268189 s, 313 MB/s

Does somebody has any idea?
I found this forum Re: NVME performance regression in Linux 5.x due to lack of block level IO queueing - Michael Marod where it says the problem was solved on kernel 5.17, but even jetpack6 will not have that kernel as mentioned in nvidia roadmap https://developer.nvidia.com/embedded/develop/roadmap#SoftwareRoadmap

mjau · October 31, 2023, 2:35pm

I observed the same thing. Is there any solution for that?

DaneLLL · November 1, 2023, 2:22am

Hi,
Please try the suggestion in:
New SSD , slow write rate - #8 by DaneLLL

You can enable O_DIRECT flag in the application.

diogojusten · November 2, 2023, 9:39am

Hi @DaneLLL , thank you for your suggestion.
I did the test with the program you mentioned and these are the result:

#JP4 Linux 4.9.254
# /tmp/nvme 
ret = 0
Direct read: total_bytes_read=2147483648 time=869 ms throughput=2356.731876
ret = 0
Buffered read: total_bytes_read=2147483648 time=1387 ms throughput=1476.568133
Direct write: total_bytes_writen=64424509440.000000  **time=37842 ms** throughput=1623.592833
Buffered write: total_bytes_writen=64424509440.000000 **time=45639 ms** throughput=1346.217051

#JP5 Linux 5.10.104
# /tmp/nvme 
ret = 0
Direct read: total_bytes_read=2147483648 time=1380 ms throughput=1484.057971
ret = 0
Buffered read: total_bytes_read=2147483648 time=1440 ms throughput=1422.222222
Direct write: total_bytes_writen=64424509440.000000  **time=37916 ms** throughput=1620.424095
Buffered write: total_bytes_writen=64424509440.000000 time=**100047 ms** throughput=614.111368

Ok, I agree direct access has a better performance, but it create the issue of data loss in case a sudden power loss.
O_DIRECT can “only” be used in my own application. For example when using tcpdump to record a PCAP file, it also has a wild difference of CPU usage and nvme writing speed compared with JP4 kernel 4.9.
Another info, when using latest JetPack 5.1.2 with kernel 5.10.120, the performance is even worst. I’m running on Jetson Xavier AGX.

Direct access seems to help only with big chunks of data:

dd if=/dev/urandom of=/media/nvme/output **bs=1024k count=1k** oflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.17123 s, **257 MB/s**

vs

dd if=/dev/urandom of=/media/nvme/output **bs=8k count=128k** oflag=direct
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.012 s, **89.4 MB/s**

Do you (or somebody) have any other insight?
Bests, Diogo

DaneLLL · November 2, 2023, 10:00am

Hi,
Do you know if the fix can be applied to K5.10 for a try? Our hardware can achieve the performance if upstream kernel is good.

Or please use Jetpack 4 release. K4.9 should not have the issue.

diogojusten · November 2, 2023, 4:37pm

The point is that I don’t have the fix yet. I’m assuming as described here that this issue is gone on kernel 5.17. Also can’t simply get the mainstream kernel as Nvidia applies a lot of changes on top. Is the work in progress Linux for Jetpack6 already available? I can’t find it here https://nv-tegra.nvidia.com/r/admin/repos/q/filter:linux-5

About downgrading to JetPack4, I can’t do it as it is reaching end of life JetPack 4 Reaches End of Life and I need the latest libraries like VPI 2.2 for the perception stack.

As mentioned in the Nvidia Roadmap Jetpack 6 is coming with kernel 5.15. What about the simple comparison between JP4 and JP6 running a basic dd command into nvme? How is it performing on JetPack6? Does nvidia have any workaround to fix this?

DaneLLL · November 3, 2023, 7:42am

Hi,
Could you help check K5.17 and share us which commits may be potential fix for the issue? It would be great if you can share us more information so that we can discuss with our teams. Ideally we would like to keep upstream kernel as is.

diogojusten · November 3, 2023, 1:59pm

Hi @DaneLLL , I have more test results to share.
I built multiples kernel versions from kernel/git/stable/linux.git - Linux kernel stable tree and flashed could run in the board.

kernel 5.7, 5.8, 5.9, 5.17, 5.18, 5.19: All have the bad nvme writing performance. kernels bellow 5.7 are not booting my target (AGX XAVIER).
kernel 6.0: bad nvme writing performance.
I used tegra194-p2972-0000.dtb as device tree and the standard /arch/arm64/configs/defconfig.

I’m trying to build and run any mainstream kernel 4.X, but i’m still not able to boot the target, sometimes a kernel panic, sometimes just freezing.

I double checked the performance kernel 6.0 vs 4.9.253, using the same nvme, same nvme partition table, etc:

# 6.0.0
dd if=/dev/zero of=/media/nvme/output bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 9.26979 s, 232 MB/s

#4.9.253
dd if=/dev/zero of=/media/nvme/output bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.665 s, 1.3 GB/s

DaneLLL · November 4, 2023, 4:28am

Hi,
Thanks for the sharing. The result looks similar to what we observe while exploring new kernel versions. Seems like it is claimed to be fixed after 5.17 but the test result doesn’t show it.

diogojusten · November 7, 2023, 4:00pm

Hi @DaneLLL , thank you again for your reply.

I did more tests focused in the PCIe, because I also noticed a bad performance when having high network traffic, and my Ethernet interface is over the PCIe. My test was to send UDP packets with iperf3 --udp --client 192.168.100.101 --bitrate 1000M.
When running kernel 4.9 (JP4), iperf -s has CPU usage around 50% in a single core.
When running kernel 5.10.104 (JP5), iperf -s has CPU usage around 82% in a single core.
These results, made me think about a possible PCIe driver issue as nvme and Ethernet are running over PCIe.

Checking the driver provided on https://developer.nvidia.com/embedded/jetson-linux-r3531 over Driver Package (BSP) Sources, I could see the PCIe driver in kernel/nvidia/drivers/pci/dwc/pcie-tegra.c that is pretty the same used for JetPack4 - kernel 4.9.
The problem is that this driver is not compatible with kernel 5.10. For the PCIe driver, kernel is build from kernel/kernel-5.10/drivers/pci/controller/dwc/pcie-tegra194.c, that’s the “generic” driver coming from mainstream.

Both kernel are running with Power Mode: MAXN and clocks set to MAX via 'jetson_clocks`

Basically network and nvme performance are worst on JP5, both are running over PCIe.
Are you sure that this “generic” driver performs as good as the old one from nvidia/drivers/ folder? Do you have PCIe benchmarks comparing JP4 and JP5?
As you/Nvidia are observing the same results, how is Nvidia handling this performance issue? Are the new JetPacks being release it the worst performance?

DaneLLL · November 8, 2023, 11:36am

Hi,
We don’t have different handling for direct IO and buffering IO, but throughput is very different for the two cases. As of now we think it is due to some security mechanism for buffering IO in upstream kernel. We have tried to remove the security mechanism from K5.10 but it misbehaves. So looks like the mechanism is must-have for K5.10.

We would suggest use direct IO to achieve optimal throughput. And if you have further finding, please share to us.

diogojusten · November 17, 2023, 9:18am

By removing cgroup configuration and traces in the kernel config improved a bit, but it is not enough. For example dd command is now writing 500 MB/s to the nvme.
It didn’t solve the problem, at the end, no solution was found and we downgraded temporally to JP4.

future.wang · June 13, 2024, 2:32am

Hi DaneLLL，

I am also troubled by the same problem.
Is there any other effective solution besides direct IO ?

Thanks！

DaneLLL · June 13, 2024, 6:15am

Hi,
In our test only direct IO can achieve maximum performance. Buffer IO on both Jetpack 4 and 5 cannot achieve the maximum performance, although Jetpack 5(K5.10) has worse performance than Jetpack 4(K4.9).

Would suggest use direct IO if you would like to achieve maximum throughput.

system · July 3, 2024, 7:11am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Upgrading jetpack onto nvme without having to wipe out the system Jetson Xavier NX reflash	16	1329	August 15, 2023
Failed to flash jetpack 5.0.2 to nvme on jetson xavier nx 16 gb module (reComputer J2012) Jetson Xavier NX reflash , board-design , nvme	27	2736	September 28, 2022
Jetpack 4.6 running on NVMe ssd Jetson Xavier NX nvme	13	1971	September 9, 2021
installing nvme drive? Jetson TX2	32	7429	October 18, 2021
Bad memory performance on JetPack 5.0.2 (5.10.104-tegra) Jetson Xavier NX nvbugs , performance	14	1259	July 16, 2024
Imbalanced Performance between Read and Write Performance Jetson AGX Xavier	19	2063	December 14, 2018
Update Bootloader for JetPack 6 fails Jetson Orin Nano boot	7	432	June 18, 2024
Install JetPack 6 DP on a newly installed nvme SSD Jetson AGX Orin sdkm , boot , ssd , nvme	21	2249	January 30, 2024
Jetpack 5.1.2 performance degredation Jetson Xavier NX cuda	16	198	October 9, 2024
Jetpack 5.0.2 with Jetson Linux 35.1 is now live! Jetson Xavier NX	30	4517	January 25, 2023

Bad Jetpack5 nvme writing performance compared with Jetpack4

Related topics