Slow memory copy performance - how to set EMC clock?

readonly · May 19, 2017, 1:32am

I’d like to know if setting Max-N with nvpmodel also sets the EMC clock to its max rate?

readonly · May 24, 2017, 2:08am

I’m porting a custom kernel driver from TX1 to TX2. One of its operations is to copy a 2.5MB frame from dma_alloc_coherent memory to user memory.

On the TX1 with the EMC clock set to max frequency, this copy takes 3.5ms. On the TX2 in MAX-N mode, the copy takes almost 11ms.

On the TX1, we set the EMC clock to max with these statements:

cat /sys/kernel/debug/clock/emc/max > /sys/kernel/debug/clock/override.emc/rate
echo 1 > /sys/kernel/debug/clock/override.emc/state

But this file path no longer exists on the TX2, so I’m not sure if I have its EMC clock at max. What is the correct way to max the EMC clock?

Honey_Patouceul · May 24, 2017, 5:51pm

You may try the jetson_clocks.sh script, in ubuntu or nvidia user’s home directory:

sudo /home/ubuntu/jetson_clocks.sh --show         # show current clocks
sudo /home/ubuntu/jetson_clocks.sh                # boost clocks
sudo /home/ubuntu/jetson_clocks.sh --show         # show new clocks and check the changes

readonly · May 24, 2017, 6:29pm

Thanks, not sure how I missed that… now the 2.5MB copy_to_user() from coherent memory executes in 3.5ms, same as the TX1.

I had expected/hoped this copy time would improve with the TX2, though, as the specs say it has higher memory bandwidth:

Memory
TX2 = 8 GB 128 bit LPDDR4 - 59.7 GB/s
TX1 = 4 GB 64 bit LPDDR4 - 25.6 GB/s

Is it right to expect a memory copy to improve from TX1 to TX2?

muilevan · May 25, 2017, 2:40am

Hi,

I used to have the same issue and it’s fixed after applying this change:
[url]https://devtalk.nvidia.com/default/topic/1009011/jetson-tx2/kernel-4-4-drivers-platform-tegra-mc-isomgr-c-isomgr_init-fails-to-initialize/post/5151445/#5151445[/url]

Regards,

readonly · May 30, 2017, 7:46pm

I have a version of the board with UARTs severed, so the clocks are now all at max. But still, the memory copy performance is identical to the TX1. Is that to be expected? I thought the TX2 would be twice the performance based on the specs.

The operation is copy_to_user() in a kernel driver copying from coherent memory to user space.

Skypuppy · May 30, 2017, 10:53pm

I may be way off base here, but it looks as if the compiler ops are not taking advantage of the wider memory bus. Or, there is a choke point somewhere. It looks like the clock rates are the same between the TX1 and TX2.

readonly · May 30, 2017, 11:05pm

Indeed, the reported max EMC clocks are similar:

TX1=1600000000
TX2=1866000000

Makes sense that the perf advantage will come from the 128 bit bus, if there is a way to take advantage of it.

snarky · May 31, 2017, 12:42am

Assuming you’re using cached memory, the transfer rate into and out of cache should be double with double the memory width.

If you are using non-cached memory, then each memory transaction takes a fixed amount of time. Using wider instructions (NEON vector instructions, for example) will let you make the most of what you have.
For example, the NEON intrinsics will let you issue 128 bit memory ops at a time:

Topic		Replies	Views
used gpc-dma instead of memcpy for memory copy Jetson TX2	2	1087	October 18, 2021
Not able to set EMC clock to maximum frequency Jetson TX1	4	1502	October 18, 2021
Memory controller of TX1 fixed at 800 MHz in R28.1 ? Jetson TX1	7	781	October 18, 2021
TX2 Computing Performance has Dropped Jetson TX2 power , performance	12	1027	October 18, 2021
Maximizing performances of TX2 Jetson TX2	5	2557	October 18, 2021
TX2 dmatest can olnly up to 215568 KB/s Jetson TX2	2	526	October 18, 2021
One weird trick to get a Maxwell v2 GPU to reach its max memory clock ! CUDA Programming and Performance	59	18318	April 22, 2016
Change TX1 LPDDR4 memory frequency Jetson TX1	14	2425	August 18, 2016
PCIe DMA doesn't work for L4T 24.1 Jetson TX1	36	6999	August 11, 2016
Get low EMMC performance on SDMMC3 Jetson TX1	21	3626	October 18, 2021

Slow memory copy performance - how to set EMC clock?

Related topics