DevKit NVMe performance

gtj · May 16, 2020, 2:03am

Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01)

608 MB/s sync write
1.2 GB/s read

Not too shabby!

jetson2 /mnt# mount /dev/nvme0n1p2 ./nvmep2
jetson2 /mnt# dd if=/dev/zero of=./nvmep2/root/.ddtest bs=1M count=1000 conv=fsync
1000+0 records in 
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.72349 s, 608 MB/s
jetson2 /mnt# umount nvmep2
jetson2 /mnt# sync
jetson2 /mnt# mount /dev/nvme0n1p2 ./nvmep2
jetson2 /mnt# dd if=./nvmep2/root/.ddtest of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.867345 s, 1.2 GB/s

carolyuu · May 18, 2020, 2:37am

Hi gtj,

Please try to use “IOZone” utility to test NVMe read/write performance.

sudo apt-get update
sudo apt-get install iozone3
sudo iozone -ecI -+n -L64 -S32 -s64m -r4k -i0 -i2 -l8 -u8 -o -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8

Before running, please set max performance mode:

sudo nvpmodel -m 0
sudo jetson_clocks

gtj · May 18, 2020, 3:18am

I wasn’t complaining. I was impressed!

Anyway, here’s the iozone results which are not very impressive. :)

jetson2 /~# nvpmodel -m 0
NVPM WARN: patching tpc_pg_mask: (0x1:0x2)
NVPM WARN: patched tpc_pg_mask: 0x2
jetson2 /~# jetson_clocks
jetson2 /~# iozone -ecI -+n -L64 -S32 -s64m -r4k -i0 -i2 -l8 -u8 -o -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8
	Iozone: Performance Test of File I/O
	        Version $Revision: 3.429 $
		Compiled for 64 bit mode.
		Build: linux 

	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
	             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
	             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
	             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
	             Vangel Bojaxhi, Ben England, Vikentsi Lapa.

	Run began: Sun May 17 21:10:01 2020

	Include fsync in write timing
	Include close in write timing
	O_DIRECT feature enabled
	No retest option selected
	File size set to 65536 kB
	Record Size 4 kB
	SYNC Mode. 
	Multi_buffer. Work area 16777216 bytes
	Command line used: iozone -ecI -+n -L64 -S32 -s64m -r4k -i0 -i2 -l8 -u8 -o -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8
	Output is in kBytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 32 kBytes.
	Processor cache line size set to 64 bytes.
	File stride size set to 17 * record size.
	Min process = 8 
	Max process = 8 
	Throughput test with 8 processes
	Each process writes a 65536 kByte file in 4 kByte records

	Children see throughput for  8 initial writers 	=    4251.39 kB/sec
	Parent sees throughput for  8 initial writers 	=    4208.91 kB/sec
	Min throughput per process 			=     530.02 kB/sec 
	Max throughput per process 			=     535.52 kB/sec
	Avg throughput per process 			=     531.42 kB/sec
	Min xfer 					=   64864.00 kB

	Children see throughput for 8 random readers 	=  163663.69 kB/sec
	Parent sees throughput for 8 random readers 	=  163434.76 kB/sec
	Min throughput per process 			=   20050.63 kB/sec 
	Max throughput per process 			=   21585.89 kB/sec
	Avg throughput per process 			=   20457.96 kB/sec
	Min xfer 					=   60848.00 kB

	Children see throughput for 8 random writers 	=    4073.73 kB/sec
	Parent sees throughput for 8 random writers 	=    4057.25 kB/sec
	Min throughput per process 			=     508.44 kB/sec 
	Max throughput per process 			=     510.97 kB/sec
	Avg throughput per process 			=     509.22 kB/sec
	Min xfer 					=   65212.00 kB



iozone test complete.
jetson2 /~#

nicholas.thomas2 · May 18, 2020, 10:11am

I’ve just ordered a NX dev kit, I’m wondering what M.2 Key-M 2280 NVMe SSDs are compatible. In the Design guide it specifies the 3.3V rail to only supply max 2.6watts. Was the drive using a low power profile when benchmarked? what model did you use when testing?

gtj · May 18, 2020, 12:40pm

“Samsung 950 PRO 256GB SSD (MZ-V5P256BW) V-NAND, M.2 NVM Express” which is the same one I have in my desktop/development machine. Got to be about 5 years old.

The spec says… “Average: 5.1 Watts, Idle : 70mW” but I don’t know how to tell what it’s currently using. The speed results were the same on my desktop as the NX though.

carolyuu · May 21, 2020, 9:21am

Hi gtj,

Below is our test result for NVMe read/write performance:

Seq Write: 126 MB/s
Seq Read: 237 MB/s

Test with Intel 256GB NVMe.

dkreutz · May 21, 2020, 6:13pm

Just for records, here are the results for Corsair MP510 240GB (NVMe PCIe Gen x4 M.2 SSD)
sudo iozone -ecI -+n -L64 -S32 -s64m -r4k -i0 -i2 -l8 -u8 -o -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8
Iozone: Performance Test of File I/O
Version $Revision: 3.429 $
Compiled for 64 bit mode.
Build: linux

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
             Vangel Bojaxhi, Ben England, Vikentsi Lapa.

Run began: Thu May 21 20:06:55 2020

Include fsync in write timing
Include close in write timing
O_DIRECT feature enabled
No retest option selected
File size set to 65536 kB
Record Size 4 kB
SYNC Mode. 
Multi_buffer. Work area 16777216 bytes
Command line used: iozone -ecI -+n -L64 -S32 -s64m -r4k -i0 -i2 -l8 -u8 -o -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 32 kBytes.
Processor cache line size set to 64 bytes.
File stride size set to 17 * record size.
Min process = 8 
Max process = 8 
Throughput test with 8 processes
Each process writes a 65536 kByte file in 4 kByte records

Children see throughput for  8 initial writers 	=    9140.81 kB/sec
Parent sees throughput for  8 initial writers 	=    9136.90 kB/sec
Min throughput per process 			=    1142.09 kB/sec 
Max throughput per process 			=    1142.85 kB/sec
Avg throughput per process 			=    1142.60 kB/sec
Min xfer 					=   65496.00 kB

Children see throughput for 8 random readers 	=  190896.49 kB/sec
Parent sees throughput for 8 random readers 	=  190497.06 kB/sec
Min throughput per process 			=   22496.33 kB/sec 
Max throughput per process 			=   26319.76 kB/sec
Avg throughput per process 			=   23862.06 kB/sec
Min xfer 					=   55516.00 kB

Children see throughput for 8 random writers 	=    9693.90 kB/sec
Parent sees throughput for 8 random writers 	=    9598.86 kB/sec
Min throughput per process 			=    1201.26 kB/sec 
Max throughput per process 			=    1218.60 kB/sec
Avg throughput per process 			=    1211.74 kB/sec
Min xfer 					=   64608.00 kB

gtj · May 21, 2020, 6:46pm

I’ve been messing with iozone for “forever” and I still can’t make heads or tails of the results and what the results mean in real life. :)

I’m sticking with dd :)

jetson2 /~# dd if=/dev/zero of=/dev/nvme0n1p3 bs=4K count=1000000
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB, 3.8 GiB) copied, 6.10829 s, 671 MB/s
jetson2 /~#

albertr · May 24, 2020, 8:35pm

Here’s the results of Samsung 981 on NX devkit (I think the dd results are skewed):

root@nx-tegra194:/# dd if=/dev/zero of=/dev/nvme0n1p2 bs=4K count=1000000
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB, 3.8 GiB) copied, 53.9752 s, 75.9 MB/s

and iozone3:

	Command line used: iozone -ecI -+n -L64 -S32 -s64m -r4k -i0 -i2 -l8 -u8 -o -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 32 kBytes.
Processor cache line size set to 64 bytes.
File stride size set to 17 * record size.
Min process = 8 
Max process = 8 
Throughput test with 8 processes
Each process writes a 65536 kByte file in 4 kByte records

Children see throughput for  8 initial writers 	=    5003.57 kB/sec
Parent sees throughput for  8 initial writers 	=    5001.63 kB/sec
Min throughput per process 			=     625.35 kB/sec 
Max throughput per process 			=     625.63 kB/sec
Avg throughput per process 			=     625.45 kB/sec
Min xfer 					=   65508.00 kB

Children see throughput for 8 random readers 	=  184847.29 kB/sec
Parent sees throughput for 8 random readers 	=  184562.06 kB/sec
Min throughput per process 			=   21840.28 kB/sec 
Max throughput per process 			=   27732.65 kB/sec
Avg throughput per process 			=   23105.91 kB/sec
Min xfer 					=   51588.00 kB

Children see throughput for 8 random writers 	=    5305.45 kB/sec
Parent sees throughput for 8 random writers 	=    5292.98 kB/sec
Min throughput per process 			=     662.01 kB/sec 
Max throughput per process 			=     664.01 kB/sec
Avg throughput per process 			=     663.18 kB/sec
Min xfer 					=   65340.00 kB

-albertr

WayneWWW · May 26, 2020, 4:15am

Hi,

We notice application(iozone in this case) has to use preadv2/pwritev2 call to use RWF_HIPRI flag. Details at The return of preadv2()/pwritev2() [LWN.net]. Otherwise, there would be high context switching in cpu.

Not sure if iozone has implemented preadv2 and pwritev2 yet. You could try this patch and should improve the performance.

    diff --git a/fs/direct-io.c b/fs/direct-io.c
index c19155f..4b2abf3 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -457,8 +457,7 @@
 		__set_current_state(TASK_UNINTERRUPTIBLE);
 		dio->waiter = current;
 		spin_unlock_irqrestore(&dio->bio_lock, flags);
-		if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
-		    !blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
+		if (!blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
 			io_schedule();
 		/* wake up sets us TASK_RUNNING */
 		spin_lock_irqsave(&dio->bio_lock, flags);

gtj · May 26, 2020, 4:55pm

The patch didn’t really improve things for iozone…

Before:

Children see throughput for  8 initial writers 	=    4315.54 kB/sec
Parent sees throughput for  8 initial writers 	=    4308.09 kB/sec
Min throughput per process 			=     538.49 kB/sec 
Max throughput per process 			=     540.10 kB/sec
Avg throughput per process 			=     539.44 kB/sec
Min xfer 					=   65340.00 kB

Children see throughput for 8 random readers 	=  237274.24 kB/sec
Parent sees throughput for 8 random readers 	=  236959.83 kB/sec
Min throughput per process 			=   27950.16 kB/sec 
Max throughput per process 			=   31743.83 kB/sec
Avg throughput per process 			=   29659.28 kB/sec
Min xfer 					=   57848.00 kB

Children see throughput for 8 random writers 	=    4425.41 kB/sec
Parent sees throughput for 8 random writers 	=    4414.20 kB/sec
Min throughput per process 			=     552.43 kB/sec 
Max throughput per process 			=     554.19 kB/sec
Avg throughput per process 			=     553.18 kB/sec
Min xfer 					=   65332.00 kB

After:

Children see throughput for  8 initial writers 	=    4661.47 kB/sec
Parent sees throughput for  8 initial writers 	=    4652.70 kB/sec
Min throughput per process 			=     582.07 kB/sec 
Max throughput per process 			=     583.53 kB/sec
Avg throughput per process 			=     582.68 kB/sec
Min xfer 					=   65372.00 kB

Children see throughput for 8 random readers 	=  237757.85 kB/sec
Parent sees throughput for 8 random readers 	=  237275.72 kB/sec
Min throughput per process 			=   25240.70 kB/sec 
Max throughput per process 			=   36438.22 kB/sec
Avg throughput per process 			=   29719.73 kB/sec
Min xfer 					=   45488.00 kB

Children see throughput for 8 random writers 	=    4781.74 kB/sec
Parent sees throughput for 8 random writers 	=    4761.44 kB/sec
Min throughput per process 			=     595.89 kB/sec 
Max throughput per process 			=     599.42 kB/sec
Avg throughput per process 			=     597.72 kB/sec
Min xfer 					=   65152.00 kB

It DID improve 4K block writes with dd though:

Before:

jetson2 /mnt# dd if=/dev/zero of=.ddtest bs=4K count=40000 oflag=direct
40000+0 records in
40000+0 records out
163840000 bytes (164 MB, 156 MiB) copied, 2.21959 s, 73.8 MB/s

After:

jetson2 /mnt# dd if=/dev/zero of=.ddtest bs=4K count=100000 oflag=direct
100000+0 records in
100000+0 records out
409600000 bytes (410 MB, 391 MiB) copied, 2.96477 s, 138 MB/s

WayneWWW · May 27, 2020, 5:45am

Hi gtj,

It sounds like the test app may affect. Could you use below command along with the patch and try again?

iozone -ecI -+n -L64 -S32 -r4k -i0 -i1 -i2 -s500m -f <path/to/output/file>

mdegans · June 9, 2020, 6:10pm

So, it appears the results from a good NVMe ssd (970 EVO Plus) on NX are identical to x86 (using gnome-disks, 100 10MiB samples) and to the devkit.

Compared to an sd card, it’s a pretty significant difference.

Note that these are just raw reads/writes and not a filesystem benchmark. Individual filesystem results will vary.

fpsychosis · June 10, 2020, 9:08am

In my case is weird, I have a Pioneer 500gb gen3, and only can get the speed of PC on reading, the people on PC get twice my writting speed , 1Gb/s
Another weird thing is the disk gives on PC the double of performance announced, but in my case only on reading, in writing it fits to the announced.
Anyway fast enough

Topic		Replies	Views
Imbalanced Performance between Read and Write Performance Jetson AGX Xavier	19	2144	December 14, 2018
Jetson Xavier USB3 to NVME speeds Jetson AGX Xavier usb , nvme	9	2244	October 18, 2021
Gen 3 PCIe NVMe SSD with x4 lanes gets higher IOPS on Nano compared to the Xavier NX Jetson Xavier NX pcie , ssd , nvme	3	1356	September 28, 2022
Jetson Xavier NX suppor NVMe SSD max speed Jetson Xavier NX nvme	11	2717	October 18, 2021
Tx2 read/write performance drop Jetson TX2 nvbugs , performance	31	1257	October 18, 2021
Installed HighPoint SSD7505 PCie 4.0 x16 On Xavier AGX get less than x8 lanes performance Jetson AGX Xavier pcie	34	2004	October 18, 2021
Xavier NX NVMe hard disk test fail Jetson Xavier NX nvme	14	2649	June 17, 2020
Jetson TK1 performance Jetson TK1	18	6466	June 18, 2014
Auvudea J120 / PCIe NVMe/ r24.2.1 booting from SSD Jetson TX1	10	3963	October 18, 2021
PCIe x4 Speed Issue with SSD Jetson TX1	19	4867	January 15, 2024

DevKit NVMe performance

Related topics