How to improve Orin module memory bandwidth

Hi everyone,
I use Orin module the bsp is R35.5.0, I use mbw tool test memory bandwidth, the result as below, the ‘Method: DUMB’ is only 2490MiB/s. How to improve the bandwidth, Thanks.

ubuntu@ubuntu:~/Downloads$ ./mbw -n 4 256 
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0       Method: MEMCPY  Elapsed: 0.03674        MiB: 256.00000  Copy: 6967.882 MiB/s
1       Method: MEMCPY  Elapsed: 0.03679        MiB: 256.00000  Copy: 6958.980 MiB/s
2       Method: MEMCPY  Elapsed: 0.03673        MiB: 256.00000  Copy: 6969.210 MiB/s
3       Method: MEMCPY  Elapsed: 0.03675        MiB: 256.00000  Copy: 6966.366 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.03675        MiB: 256.00000  Copy: 6965.607 MiB/s
0       Method: DUMB    Elapsed: 0.10273        MiB: 256.00000  Copy: 2492.042 MiB/s
1       Method: DUMB    Elapsed: 0.10278        MiB: 256.00000  Copy: 2490.830 MiB/s
2       Method: DUMB    Elapsed: 0.10280        MiB: 256.00000  Copy: 2490.151 MiB/s
3       Method: DUMB    Elapsed: 0.10280        MiB: 256.00000  Copy: 2490.345 MiB/s
AVG     Method: DUMB    Elapsed: 0.10278        MiB: 256.00000  Copy: 2490.842 MiB/s
0       Method: MCBLOCK Elapsed: 0.03780        MiB: 256.00000  Copy: 6772.666 MiB/s
1       Method: MCBLOCK Elapsed: 0.03784        MiB: 256.00000  Copy: 6765.685 MiB/s
2       Method: MCBLOCK Elapsed: 0.03776        MiB: 256.00000  Copy: 6778.763 MiB/s
3       Method: MCBLOCK Elapsed: 0.03775        MiB: 256.00000  Copy: 6780.559 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03779        MiB: 256.00000  Copy: 6774.413 MiB/s

Hi,
The topic is in Jetson Nano category but description mentions Orin. Do you use AGX Orin or Orin Nano? Would like to confirm which platform you are using.

I use AGX Orin platform. Thanks.

Hi,
Please run sudo jetson_clocks and see if there is improvement. And please share what [Method: DUMP] is testing. We don’t have experience about using the tool and please share more information.

Hi DaneLL,
I run sudo jetson_clocks the bandwidth is not improved. You could download mbw source code the URL is :GitHub - raas/mbw: Memory Bandwidth Benchmark
DUMP is copy one btye test, the code as below:

 if(type==TEST_DUMB) { /* dumb test */
        gettimeofday(&starttime, NULL);
        for(t=0; t<asize; t++) {
            b[t]=a[t];
        }
        gettimeofday(&endtime, NULL);
    }

Hi,
We test it on AGX Orin developer kit + r35.5.0:

$ mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0	Method: MEMCPY	Elapsed: 0.02727	MiB: 256.00000	Copy: 9388.983 MiB/s
1	Method: MEMCPY	Elapsed: 0.02720	MiB: 256.00000	Copy: 9412.111 MiB/s
2	Method: MEMCPY	Elapsed: 0.02723	MiB: 256.00000	Copy: 9402.431 MiB/s
3	Method: MEMCPY	Elapsed: 0.02723	MiB: 256.00000	Copy: 9401.741 MiB/s
AVG	Method: MEMCPY	Elapsed: 0.02723	MiB: 256.00000	Copy: 9401.309 MiB/s
0	Method: DUMB	Elapsed: 0.02721	MiB: 256.00000	Copy: 9408.306 MiB/s
1	Method: DUMB	Elapsed: 0.02717	MiB: 256.00000	Copy: 9423.197 MiB/s
2	Method: DUMB	Elapsed: 0.02715	MiB: 256.00000	Copy: 9428.750 MiB/s
3	Method: DUMB	Elapsed: 0.02716	MiB: 256.00000	Copy: 9425.279 MiB/s
AVG	Method: DUMB	Elapsed: 0.02717	MiB: 256.00000	Copy: 9421.377 MiB/s
0	Method: MCBLOCK	Elapsed: 0.01807	MiB: 256.00000	Copy: 14163.992 MiB/s
1	Method: MCBLOCK	Elapsed: 0.01826	MiB: 256.00000	Copy: 14021.251 MiB/s
2	Method: MCBLOCK	Elapsed: 0.01783	MiB: 256.00000	Copy: 14360.240 MiB/s
3	Method: MCBLOCK	Elapsed: 0.01838	MiB: 256.00000	Copy: 13927.425 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.01813	MiB: 256.00000	Copy: 14116.350 MiB/s

Don’t observe the issue. Do you use developer kit or your custom board?

Hi DaneLLL,
I both use our carrier board and developer kit board test. When use download mbw test result is :

$ mbw -t 4 256 
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 10 runs per test.
0       Method: MEMCPY  Elapsed: 0.03726        MiB: 256.00000  Copy: 6871.008 MiB/s
1       Method: MEMCPY  Elapsed: 0.03732        MiB: 256.00000  Copy: 6859.409 MiB/s
2       Method: MEMCPY  Elapsed: 0.03712        MiB: 256.00000  Copy: 6895.623 MiB/s
3       Method: MEMCPY  Elapsed: 0.03724        MiB: 256.00000  Copy: 6873.960 MiB/s
4       Method: MEMCPY  Elapsed: 0.03728        MiB: 256.00000  Copy: 6867.874 MiB/s
5       Method: MEMCPY  Elapsed: 0.03713        MiB: 256.00000  Copy: 6894.880 MiB/s
6       Method: MEMCPY  Elapsed: 0.03718        MiB: 256.00000  Copy: 6885.422 MiB/s
7       Method: MEMCPY  Elapsed: 0.03724        MiB: 256.00000  Copy: 6874.513 MiB/s
8       Method: MEMCPY  Elapsed: 0.03710        MiB: 256.00000  Copy: 6900.828 MiB/s
9       Method: MEMCPY  Elapsed: 0.03715        MiB: 256.00000  Copy: 6890.797 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.03720        MiB: 256.00000  Copy: 6881.406 MiB/s
0       Method: DUMB    Elapsed: 0.03716        MiB: 256.00000  Copy: 6889.684 MiB/s
1       Method: DUMB    Elapsed: 0.03725        MiB: 256.00000  Copy: 6872.299 MiB/s
2       Method: DUMB    Elapsed: 0.03716        MiB: 256.00000  Copy: 6888.757 MiB/s
3       Method: DUMB    Elapsed: 0.03720        MiB: 256.00000  Copy: 6881.720 MiB/s
4       Method: DUMB    Elapsed: 0.03711        MiB: 256.00000  Copy: 6898.038 MiB/s
5       Method: DUMB    Elapsed: 0.03712        MiB: 256.00000  Copy: 6896.923 MiB/s
6       Method: DUMB    Elapsed: 0.03715        MiB: 256.00000  Copy: 6891.539 MiB/s
7       Method: DUMB    Elapsed: 0.03716        MiB: 256.00000  Copy: 6889.684 MiB/s
8       Method: DUMB    Elapsed: 0.03741        MiB: 256.00000  Copy: 6843.639 MiB/s
9       Method: DUMB    Elapsed: 0.03712        MiB: 256.00000  Copy: 6896.923 MiB/s
AVG     Method: DUMB    Elapsed: 0.03718        MiB: 256.00000  Copy: 6884.885 MiB/s
0       Method: MCBLOCK Elapsed: 0.02421        MiB: 256.00000  Copy: 10574.143 MiB/s
1       Method: MCBLOCK Elapsed: 0.02355        MiB: 256.00000  Copy: 10868.642 MiB/s
2       Method: MCBLOCK Elapsed: 0.02356        MiB: 256.00000  Copy: 10864.030 MiB/s
3       Method: MCBLOCK Elapsed: 0.02338        MiB: 256.00000  Copy: 10950.935 MiB/s
4       Method: MCBLOCK Elapsed: 0.02381        MiB: 256.00000  Copy: 10753.140 MiB/s
5       Method: MCBLOCK Elapsed: 0.02368        MiB: 256.00000  Copy: 10809.898 MiB/s
6       Method: MCBLOCK Elapsed: 0.02389        MiB: 256.00000  Copy: 10716.678 MiB/s
7       Method: MCBLOCK Elapsed: 0.02379        MiB: 256.00000  Copy: 10761.729 MiB/s
8       Method: MCBLOCK Elapsed: 0.02380        MiB: 256.00000  Copy: 10756.303 MiB/s
9       Method: MCBLOCK Elapsed: 0.02391        MiB: 256.00000  Copy: 10706.817 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.02376        MiB: 256.00000  Copy: 10775.318 MiB/s
*** stack smashing detected ***: terminated
Aborted (core dumped)

When use mbw compiled from source code the test result is

$ ./mbw -n 4 256    
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0       Method: MEMCPY  Elapsed: 0.03710        MiB: 256.00000  Copy: 6900.270 MiB/s
1       Method: MEMCPY  Elapsed: 0.03719        MiB: 256.00000  Copy: 6884.496 MiB/s
2       Method: MEMCPY  Elapsed: 0.03709        MiB: 256.00000  Copy: 6901.572 MiB/s
3       Method: MEMCPY  Elapsed: 0.03715        MiB: 256.00000  Copy: 6890.983 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.03713        MiB: 256.00000  Copy: 6894.323 MiB/s
0       Method: DUMB    Elapsed: 0.10219        MiB: 256.00000  Copy: 2505.113 MiB/s
1       Method: DUMB    Elapsed: 0.10184        MiB: 256.00000  Copy: 2513.698 MiB/s
2       Method: DUMB    Elapsed: 0.10196        MiB: 256.00000  Copy: 2510.862 MiB/s
3       Method: DUMB    Elapsed: 0.10183        MiB: 256.00000  Copy: 2513.870 MiB/s
AVG     Method: DUMB    Elapsed: 0.10196        MiB: 256.00000  Copy: 2510.881 MiB/s
0       Method: MCBLOCK Elapsed: 0.03798        MiB: 256.00000  Copy: 6740.745 MiB/s
1       Method: MCBLOCK Elapsed: 0.03797        MiB: 256.00000  Copy: 6742.875 MiB/s
2       Method: MCBLOCK Elapsed: 0.03790        MiB: 256.00000  Copy: 6754.261 MiB/s
3       Method: MCBLOCK Elapsed: 0.03803        MiB: 256.00000  Copy: 6731.882 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03797        MiB: 256.00000  Copy: 6742.431 MiB/s

I use Production Orin module and use MAXN power mode.
Which power mode do you use? Which Orin module do you use Dev-Kit Module (P3701-0000) or Production 32GB-DRAM (P3701-0004) ?

Hi DaneLLL,
Is there any progress on this issue?

Hi,
We have not reproduced the issue yet. So you think the result of downloaded mbw binary is not correct?

Hi DaneLLL,
Could you use mbw compiled with source code for testing?

Hi,
Please enable the flags in Makefile and give it a try:

CFLAGS=-O2 -Wall -g

Hi DaneLLL,
I enabled “CFLAGS=-O2 -Wall -g” in Makefile the test result is :

ubuntu@ubuntu:~/Downloads/test/mbw-master$ ./mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0       Method: MEMCPY  Elapsed: 0.03683        MiB: 256.00000  Copy: 6951.422 MiB/s
1       Method: MEMCPY  Elapsed: 0.03690        MiB: 256.00000  Copy: 6937.481 MiB/s
2       Method: MEMCPY  Elapsed: 0.03681        MiB: 256.00000  Copy: 6955.199 MiB/s
3       Method: MEMCPY  Elapsed: 0.03688        MiB: 256.00000  Copy: 6941.432 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.03685        MiB: 256.00000  Copy: 6946.376 MiB/s
0       Method: DUMB    Elapsed: 0.03678        MiB: 256.00000  Copy: 6959.926 MiB/s
1       Method: DUMB    Elapsed: 0.03682        MiB: 256.00000  Copy: 6951.799 MiB/s
2       Method: DUMB    Elapsed: 0.03680        MiB: 256.00000  Copy: 6957.278 MiB/s
3       Method: DUMB    Elapsed: 0.03678        MiB: 256.00000  Copy: 6960.494 MiB/s
AVG     Method: DUMB    Elapsed: 0.03680        MiB: 256.00000  Copy: 6957.373 MiB/s
0       Method: MCBLOCK Elapsed: 0.03691        MiB: 256.00000  Copy: 6935.790 MiB/s
1       Method: MCBLOCK Elapsed: 0.03701        MiB: 256.00000  Copy: 6916.676 MiB/s
2       Method: MCBLOCK Elapsed: 0.03693        MiB: 256.00000  Copy: 6932.597 MiB/s
3       Method: MCBLOCK Elapsed: 0.03691        MiB: 256.00000  Copy: 6936.729 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03694        MiB: 256.00000  Copy: 6930.438 MiB/s

The "DUMB " improved form 2513.698 MiB/s to 6957.373 MiB/s , the bandwidth is still lower than the data you tested(9425.279 MiB/s). I suppose where is a difference between Dev-Kit Module (P3701-0000) and Production 32GB-DRAM (P3701-0004) .

Could you use mbw compiled with source code for testing?

Hi,
we do compile mbw with CFLAGS=-O2 -Wall -g and see ~9400 MiB/s. If you run with prebuilt and self-built mbw, and see identical result, it is throughput of production module.

Hi DaneLLL,
Which power mode do you use? I use 30W mode, and test result is:

ubuntu@ubuntu:~/Downloads/mbw-master$ ./mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0       Method: MEMCPY  Elapsed: 0.06479        MiB: 256.00000  Copy: 3951.471 MiB/s
1       Method: MEMCPY  Elapsed: 0.06387        MiB: 256.00000  Copy: 4008.393 MiB/s
2       Method: MEMCPY  Elapsed: 0.06342        MiB: 256.00000  Copy: 4036.454 MiB/s
3       Method: MEMCPY  Elapsed: 0.06336        MiB: 256.00000  Copy: 4040.595 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.06386        MiB: 256.00000  Copy: 4008.910 MiB/s
0       Method: DUMB    Elapsed: 0.06330        MiB: 256.00000  Copy: 4044.042 MiB/s
1       Method: DUMB    Elapsed: 0.06359        MiB: 256.00000  Copy: 4025.664 MiB/s
2       Method: DUMB    Elapsed: 0.06458        MiB: 256.00000  Copy: 3964.260 MiB/s
3       Method: DUMB    Elapsed: 0.06395        MiB: 256.00000  Copy: 4003.190 MiB/s
AVG     Method: DUMB    Elapsed: 0.06386        MiB: 256.00000  Copy: 4009.067 MiB/s
0       Method: MCBLOCK Elapsed: 0.06429        MiB: 256.00000  Copy: 3982.019 MiB/s
1       Method: MCBLOCK Elapsed: 0.04615        MiB: 256.00000  Copy: 5547.009 MiB/s
2       Method: MCBLOCK Elapsed: 0.03815        MiB: 256.00000  Copy: 6710.706 MiB/s
3       Method: MCBLOCK Elapsed: 0.03824        MiB: 256.00000  Copy: 6694.036 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04671        MiB: 256.00000  Copy: 5480.889 MiB/s

ubuntu@ubuntu:~/Downloads/mbw-master$ mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0       Method: MEMCPY  Elapsed: 0.06290        MiB: 256.00000  Copy: 4069.952 MiB/s
1       Method: MEMCPY  Elapsed: 0.06284        MiB: 256.00000  Copy: 4073.579 MiB/s
2       Method: MEMCPY  Elapsed: 0.06278        MiB: 256.00000  Copy: 4077.862 MiB/s
3       Method: MEMCPY  Elapsed: 0.06272        MiB: 256.00000  Copy: 4081.763 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.06281        MiB: 256.00000  Copy: 4075.784 MiB/s
0       Method: DUMB    Elapsed: 0.06268        MiB: 256.00000  Copy: 4083.977 MiB/s
1       Method: DUMB    Elapsed: 0.06271        MiB: 256.00000  Copy: 4082.284 MiB/s
2       Method: DUMB    Elapsed: 0.06262        MiB: 256.00000  Copy: 4088.412 MiB/s
3       Method: DUMB    Elapsed: 0.06264        MiB: 256.00000  Copy: 4086.585 MiB/s
AVG     Method: DUMB    Elapsed: 0.06266        MiB: 256.00000  Copy: 4085.313 MiB/s
0       Method: MCBLOCK Elapsed: 0.03730        MiB: 256.00000  Copy: 6863.087 MiB/s
1       Method: MCBLOCK Elapsed: 0.03672        MiB: 256.00000  Copy: 6970.918 MiB/s
2       Method: MCBLOCK Elapsed: 0.03655        MiB: 256.00000  Copy: 7004.871 MiB/s
3       Method: MCBLOCK Elapsed: 0.03707        MiB: 256.00000  Copy: 6905.481 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03691        MiB: 256.00000  Copy: 6935.649 MiB/s