PCIe x4 Bandwidth on Nano

roshan.dsouza · June 3, 2020, 5:23am

Dear Community,
I have a custom carrier board built and now I am testing the performance of the PCIe x4 interface. For storage we have an M.2 PCIe socket. When I tried an SSD module of PCIe x1 lane, I got 1GB data dump rate at 113 MB/s. The lspci -vv command returns Lane Width as x1.

Now when I connect a PCIe x4 SSD, I get the same data dump rate at ~120MB/s. But the lspci -vv command provides Lane Width as x4 & Link speed as 5Gb/s.

Can somebody help me understand why I don’t get a significant data rate improvement here?

vidyas · June 3, 2020, 6:35am

You should get around 12Gbps (i.e. around 1500 MB/s) in PCIe x4 and Gen-2 configuration.
Could you please tell us how you are verifying the performance?
Also, what is the advertised bandwidth of the SSD? (wondering is that AHCI i.e. SATA protocol based or NVMe based)

Manikanta · June 3, 2020, 6:37am

Hi,

Maybe data rate limitation caused by SSD and not because of PCIe? Can you provide below details,

What is data rate specification of SSD?
How are you measuring the SSD performance, is it iozone? What record size are you using? Can you try with 16M record size?

Thanks,
Manikanta

roshan.dsouza · June 3, 2020, 9:14am

I am using ‘dd’ command from terminal and dumping 1GB for x1 lane & 4GB for x4 lane.
Advertised bandwidth of PCIe x4 based SSD is 1000MB/s.
It is an NVME based SSD; particularly the Kingston A2000 250GB

roshan.dsouza · June 4, 2020, 2:16pm

Team,
Is there an update regarding this?

vidyas · June 4, 2020, 6:24pm

Could you please share the exact ‘dd’ command being used here?
I would use the ‘dd’ command something like below
dd if=/dev/zero of=test bs=16M count=64 oflag=direct conv=fdatasync

roshan.dsouza · June 5, 2020, 6:37am

dd if=/dev/zero of= bs=1024 count=1073741824 status = progress

path to SSD drive obtained by issuing lsblk command

vidyas · June 8, 2020, 4:57pm

block size of 1024 seems too less. We typically use 4K size for random read/writes and 16M for sequential read/writes. Could you please use the above command that I mentioned and update the result?

roshan.dsouza · June 9, 2020, 9:11am

Same. Please see attached.

vidyas · June 9, 2020, 11:28am

Could you please tell us the JetPack release you are using and also the BSP version?

vidyas · June 9, 2020, 7:13pm

Also, given that there is an error while writing more than 4GB file, I assume you are using FAT32 file system. Could you please try formatting the drive with EXT4 file system?
mkfs.ext4 command can be used.
FWIW, I tried with an Intel750 NVMe drive and I could see a four fold increase in the performance with EXT4 file system compared to FAT32.

root@tegra-ubuntu:/home/ubuntu/temp# 
root@tegra-ubuntu:/home/ubuntu/temp# dd if=/dev/zero of=test bs=16M count=64 oflag=direct conv=fdatasync
64+0 records in
64+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.28011 s, 839 MB/s
root@tegra-ubuntu:/home/ubuntu/temp# 
root@tegra-ubuntu:/home/ubuntu/temp# cd ..
root@tegra-ubuntu:/home/ubuntu# 
root@tegra-ubuntu:/home/ubuntu# umount temp
root@tegra-ubuntu:/home/ubuntu# 
root@tegra-ubuntu:/home/ubuntu# busybox mkfs.vfat /dev/nvme0n1 

root@tegra-ubuntu:/home/ubuntu# 
[  712.671104]  nvme0n1:
root@tegra-ubuntu:/home/ubuntu# 
root@tegra-ubuntu:/home/ubuntu# mount /dev/nvme0n1 temp/
root@tegra-ubuntu:/home/ubuntu# 
root@tegra-ubuntu:/home/ubuntu# cd temp/
root@tegra-ubuntu:/home/ubuntu/temp# 
root@tegra-ubuntu:/home/ubuntu/temp# dd if=/dev/zero of=test bs=16M count=64 oflag=direct conv=fdatasync
64+0 records in
64+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.2717 s, 204 MB/s
root@tegra-ubuntu:/home/ubuntu/temp#

roshan.dsouza · June 13, 2020, 3:44am

Apologies vidyas, I don’t recollect the file system. I believe it was FAT32 but Ill provide more details when I have access to the hardware.

This bandwidth is impressive. We would like to see the same on our hardware.

What hardware are you using since Nano EVM doesn’t have PCIe x4?

linuxdev · June 13, 2020, 8:48pm

One thing you should know is that for benchmarks you should not use a regular file at all for testing. The reason is that you are going through the driver for ext4 (or VFAT or something else), and that there is a lot going on with seeking and security and setup. In the case where your source is “/dev/zero”, then this works well because it is a device special file, and not a real file…the output limitations of “/dev/zero” are only from the driver.

Consider if you have a partition you can test against where you don’t mind destroying the content of that partition. I’ll call it “/dev/sdaX”. When you use dd to write to “/dev/sdaX”, once security allows this, then all of the throughput limitation is via the driver and the disk. The raw hardware driver is probably not much of a limitation. If you instead write to a file on the same “/dev/sdaX”, then performance would be expected to drop due to all of the extra software in the path.

roshan.dsouza · June 14, 2020, 6:24am

So then how can i benchmark this?

also why the software isn’t limiting the results posted by vidya?

linuxdev · June 14, 2020, 5:13pm

Functions which provide data to a device special file are not an issue since this operates on a raw device driver and does not require filesystem overhead other than when determining if the user is allowed access. Any time your source data is from “/dev/zero” this is fulfilled. Any time your destination is to a partition instead of a file the requirement is fulfilled. Use something like “gparted” to non-destructively slice part of your rootfs partition into a new partition. Then use dd with the output to that partition (device special file) instead of to a file.

Going through a file is not an error. I’ll compare this though to having an automobile race through the middle of a large city, and consider this to be a bad race car because it doesn’t run as fast as it does on a track designed for racing. Ext4 and some other filesystems may be relatively efficient, but this is far from streamlined. You could end up with the same results, but until you avoid the filesystem you are not testing the bandwidth of the raw device…you are testing the throughput of a series of devices chained together.

In the case of an error with a non-ext4 filesystem, this is quite possibly due to exceeding max file size for that filesystem type. Ext4 has an ability to work with enormous files, whereas some of the 32-bit Windows filesystem types stop at 2GB…if your filesystem is VFAT or FAT32 or one of those, then this will be an outright error forcing dd to stop. Even if you do not fail under one of these other filesystem types I would expect ext4 to be better optimized and faster. Better yet is to not use a filesystem at all, and if you directly access a partition via the device special file, then no filesystem is in the chain (the filesystem is used to find the partition, but then stays out of the way).

You might end up with the same results, but if you pull the filesystem type out of the chain of limitations then you’ll actually know what you are profiling.

vidyas · June 15, 2020, 5:08pm

What hardware are you using since Nano EVM doesn’t have PCIe x4?

I’m using Jetson-TX1 system which has PCIe x4 slot.
BTW, I tried writing to /dev/nvme0n1 directly by setting of=/dev/nvme0n1 and I could still manage to get around the same number. As @linuxdev mentioned, ext4 file system being a very efficient file system, it is giving perf at par with the raw device. But, I agree with @linuxdev that we should avoid any kind of file system in between and do reads/writes directly with the device file.

roshan.dsouza · June 30, 2020, 7:52am

@vidyas,
The test system doesn’t show Nano performance but can be considered as an emulation, but PCIe core would be same and test results comparable.

I tested the same with Nano & we were able to achieve 790MB/s.

vidyas · July 1, 2020, 1:00pm

Thanks for letting us know

Topic		Replies	Views
PCIe x4 Speed Issue with SSD Jetson TX1	19	5052	January 15, 2024
I wonder which NVMe SSD should I buy for Jetson Nano Jetson Nano nvme	2	1726	March 16, 2022
Why my nano usb speed so slow Jetson Nano usb	12	2760	December 2, 2020
Gen 3 PCIe NVMe SSD with x4 lanes gets higher IOPS on Nano compared to the Xavier NX Jetson Xavier NX pcie , ssd , nvme	3	1432	September 28, 2022
Xavier PCIe performance Jetson AGX Xavier	13	3043	November 25, 2019
Connecting NVMe (x4) PCIe SSD on Jetson Developer Kit Carrier Board B01 Jetson Nano pcie , ssd , nvme	8	3063	October 15, 2021
ARM64 Nvidia Jetson AGX Orin development kit 32GB only 5GB/s performance Jetson AGX Xavier ubuntu , nvbugs	3	954	May 24, 2023
Problems with 4x GigE PCIe Jetson Nano pcie , ethernet	5	1732	September 9, 2021
Imbalanced Performance between Read and Write Performance Jetson AGX Xavier	19	2292	December 14, 2018
Nano cannot mount PCIe SSD as roofts Jetson Nano pcie , ssd	10	824	December 28, 2022

PCIe x4 Bandwidth on Nano

Related topics