Tx2 read/write performance drop

Hi HuiW,

Thanks for sharing this info. I guess we could reproduce this issue too.
Still under investigation.

Hi HuiW,

Sorry that we need your input again. In above description, do you mean

  1. The r32.4.2 perf is equal to rel-28.3

or

  1. re-32.4.2 does not have the problem that mode 3 is better than mode 0?

Also, could you share your iozone result with us with both release?

Hi WayneWWW,

Sorry for the late reply.
Attached the test result for your reference.

The main issue we issued is Tx2 read/write performance drop.

From the test results, the read/write performance of re-32.4.2 do not have big different between mode 0 & 3.

Would you share us how to use iozone, what value would you like to know?

Thank you,

Hi HuiW,

Please run below command

iozone -ecI -+n -L64 -S32 -s64m -r512k -i0 -i1 -l8 -u8 -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8

Hi WayneWWW,

From L4t R32.4.2 on TX2:

nvidia44@nvidia44-desktop:~$ sudo nvpmodel -m 0
nvidia44@nvidia44-desktop:~$ sudo iozone -ecI -+n -L64 -S32 -s64m -r512k -i0 -i1 -l8 -u8 -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8
Iozone: Performance Test of File I/O
Version $Revision: 3.429 $
Compiled for 64 bit mode.
Build: linux

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
             Vangel Bojaxhi, Ben England, Vikentsi Lapa.

Run began: Tue Apr 28 14:25:51 2020

Include fsync in write timing
Include close in write timing
O_DIRECT feature enabled
No retest option selected
File size set to 65536 kB
Record Size 512 kB
Multi_buffer. Work area 16777216 bytes
Command line used: iozone -ecI -+n -L64 -S32 -s64m -r512k -i0 -i1 -l8 -u8 -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 32 kBytes.
Processor cache line size set to 64 bytes.
File stride size set to 17 * record size.
Min process = 8 
Max process = 8 
Throughput test with 8 processes
Each process writes a 65536 kByte file in 512 kByte records

Children see throughput for  8 initial writers 	=  105959.18 kB/sec
Parent sees throughput for  8 initial writers 	=  104542.20 kB/sec
Min throughput per process 			=   13137.44 kB/sec 
Max throughput per process 			=   13457.31 kB/sec
Avg throughput per process 			=   13244.90 kB/sec
Min xfer 					=   64000.00 kB

Children see throughput for  8 readers 		=  271184.37 kB/sec
Parent sees throughput for  8 readers 		=  270832.19 kB/sec
Min throughput per process 			=   33881.86 kB/sec 
Max throughput per process 			=   33917.39 kB/sec
Avg throughput per process 			=   33898.05 kB/sec
Min xfer 					=   65536.00 kB

iozone test complete.

Hi HuiW,

How is the result of rel-28.2.1?

Hi WayneWWW,

L4t R28.2.1 on TX2
nvidia@tegra-ubuntu:~$ sudo iozone -ecI -+n -L64 -S32 -s64m -r512k -i0 -i1 -l8 -u8 -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8
Iozone: Performance Test of File I/O
Version $Revision: 3.429 $
Compiled for 64 bit mode.
Build: linux

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
             Vangel Bojaxhi, Ben England, Vikentsi Lapa.

Run began: Tue May 19 19:46:20 2020

Include fsync in write timing
Include close in write timing
O_DIRECT feature enabled
No retest option selected
File size set to 65536 kB
Record Size 512 kB
Multi_buffer. Work area 16777216 bytes
Command line used: iozone -ecI -+n -L64 -S32 -s64m -r512k -i0 -i1 -l8 -u8 -m -t8 -F /mnt/file1 /mnt/file2 /mnt/file3 /mnt/file4 /mnt/file5 /mnt/file6 /mnt/file7 /mnt/file8
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 32 kBytes.
Processor cache line size set to 64 bytes.
File stride size set to 17 * record size.
Min process = 8 
Max process = 8 
Throughput test with 8 processes
Each process writes a 65536 kByte file in 512 kByte records

Children see throughput for  8 initial writers 	=   78009.33 kB/sec
Parent sees throughput for  8 initial writers 	=   33484.65 kB/sec
Min throughput per process 			=    3029.03 kB/sec 
Max throughput per process 			=   22996.80 kB/sec
Avg throughput per process 			=    9751.17 kB/sec
Min xfer 					=    8704.00 kB

Children see throughput for  8 readers 		=  279792.20 kB/sec
Parent sees throughput for  8 readers 		=  264273.93 kB/sec
Min throughput per process 			=       0.00 kB/sec 
Max throughput per process 			=   94538.00 kB/sec
Avg throughput per process 			=   34974.03 kB/sec
Min xfer 					=       0.00 kB

iozone test complete.
nvidia@tegra-ubuntu:~$ sudo nvpmodel -q
NV Power Mode: MAXN
0
nvidia@tegra-ubuntu:~$

Hi HuiW,

Please refer to the “Children see throughput” in your result.

 	Power mode 0
(MB/s)	         Seq Write/Seq Read
JetPack-3.3       78/279
JetPack-4.4       104/ 271

Looks jetpack4.4 does not have significant regression from jetpack3.3.

Hi HuiW,

For this issue, please apply this patch to rel-28.

diff --git a/arch/arm64/configs/tegra18_defconfig b/arch/arm64/configs/tegra18_defconfig
index 2c09bed..690e266 100644
--- a/arch/arm64/configs/tegra18_defconfig
+++ b/arch/arm64/configs/tegra18_defconfig
@@ -44,7 +44,6 @@
 CONFIG_BLK_DEV_THROTTLING=y
 CONFIG_PARTITION_ADVANCED=y
 # CONFIG_IOSCHED_DEADLINE is not set
-CONFIG_CFQ_GROUP_IOSCHED=y
 CONFIG_ARCH_TEGRA=y
 CONFIG_PCI=y
 CONFIG_PCI_STUB=m
@@ -721,6 +720,7 @@
 CONFIG_MMC_SDHCI_OF_ARASAN=m
 CONFIG_MMC_SDHCI_OF_AT91=m
 CONFIG_MMC_SDHCI_TEGRA=y
+CONFIG_MMC_CQ_HCI=y
 CONFIG_MMC_SDHCI_F_SDH30=m
 CONFIG_MMC_TIFM_SD=m
 CONFIG_MMC_SPI=m

Hi WayneWWW,

Thanks for your support.

May I check again the patch you just provided is for rel-28 or rel-32.3?
We had thought the read performance of rel-32.3 is tool low.
But you provided the patch for rel-28, do you think the read performance of rel-28 is too high?

Thank you,

Hi,

Yes, the issue is actually on rel-28 but not rel-32.