TX2 R32.3.1 How to improve memory bandwidth

Hi everyone,

I use TX2 the bsp is R32.3.1. I tested memory bandwidth use blow command

dd if=/dev/zero of=/dev/null bs=1024576 count=10000

When ‘NV Power Mode’ is 0 the result is 1.4 GB/s, and when ‘NV Power Mode’ is 3 the result is 5.0 GB/s.

I also used R28.2 bsp test the the result is :
When ‘NV Power Mode’ is 0 the result is 8.8 GB/s, and when ‘NV Power Mode’ is 3 the result is 7.3 GB/s.

How to improve memory bandwidth us R32.3.1 bsp? Thansk.

Does it help to increase the “dd” priority?

sudo -s
nice -n -3 dd if=/dev/zero of=/dev/null bs=1024576 count=10000
exit

(a renice by “-3” is significantly higher priority, but not so much that it would interfere with something else)

I am just curious if the limitation is due to something else competing with the memory controller.

Hi linuxdev,

I used the command you provided to test the result is also 1.4GB/s.

In that case it isn’t due to competition with another thread/process. Have you maxed out clocks for each mode (set the nvpmodel you are looking at, then “sudo jetson_clocks”)?

Hi linuxdev,

I use command “sudo nvpmodel -m 0” test the bandwidth is 1.4GB/s, and use command “sudo nvpmodel -m 3” test the badndwidth is 5GB/s. Also I use “sudo jetson_clocks” the test result is not improved.

It is quite odd that “-m 3” has higher bandwidth than does “-m 0”. It is only speculation, but perhaps some mutex or spinlock is taking longer when there are more cores.

Just speculating…"-m 0" will use all 6 cores, including the 2 Denver cores. “-m 3” will only use the ARM Cortex-A57 cores. Perhaps, the 2 Denver cores, emulating the ARM instruction set and accessing the memory controller, caches, etc. are not as efficient as the Cortex-A57s. Therefore, more cores is not as efficient as less cores because more cores adds the 2 Denver cores to the mix.