Hi everyone,
I use Orin module the bsp is R35.5.0, I use mbw tool test memory bandwidth, the result as below, the ‘Method: DUMB’ is only 2490MiB/s. How to improve the bandwidth, Thanks.
ubuntu@ubuntu:~/Downloads$ ./mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0 Method: MEMCPY Elapsed: 0.03674 MiB: 256.00000 Copy: 6967.882 MiB/s
1 Method: MEMCPY Elapsed: 0.03679 MiB: 256.00000 Copy: 6958.980 MiB/s
2 Method: MEMCPY Elapsed: 0.03673 MiB: 256.00000 Copy: 6969.210 MiB/s
3 Method: MEMCPY Elapsed: 0.03675 MiB: 256.00000 Copy: 6966.366 MiB/s
AVG Method: MEMCPY Elapsed: 0.03675 MiB: 256.00000 Copy: 6965.607 MiB/s
0 Method: DUMB Elapsed: 0.10273 MiB: 256.00000 Copy: 2492.042 MiB/s
1 Method: DUMB Elapsed: 0.10278 MiB: 256.00000 Copy: 2490.830 MiB/s
2 Method: DUMB Elapsed: 0.10280 MiB: 256.00000 Copy: 2490.151 MiB/s
3 Method: DUMB Elapsed: 0.10280 MiB: 256.00000 Copy: 2490.345 MiB/s
AVG Method: DUMB Elapsed: 0.10278 MiB: 256.00000 Copy: 2490.842 MiB/s
0 Method: MCBLOCK Elapsed: 0.03780 MiB: 256.00000 Copy: 6772.666 MiB/s
1 Method: MCBLOCK Elapsed: 0.03784 MiB: 256.00000 Copy: 6765.685 MiB/s
2 Method: MCBLOCK Elapsed: 0.03776 MiB: 256.00000 Copy: 6778.763 MiB/s
3 Method: MCBLOCK Elapsed: 0.03775 MiB: 256.00000 Copy: 6780.559 MiB/s
AVG Method: MCBLOCK Elapsed: 0.03779 MiB: 256.00000 Copy: 6774.413 MiB/s
Hi,
The topic is in Jetson Nano category but description mentions Orin. Do you use AGX Orin or Orin Nano? Would like to confirm which platform you are using.
I use AGX Orin platform. Thanks.
Hi,
Please run sudo jetson_clocks and see if there is improvement. And please share what [Method: DUMP] is testing. We don’t have experience about using the tool and please share more information.
Hi DaneLL,
I run sudo jetson_clocks the bandwidth is not improved. You could download mbw source code the URL is :GitHub - raas/mbw: Memory Bandwidth Benchmark
DUMP is copy one btye test, the code as below:
if(type==TEST_DUMB) { /* dumb test */
gettimeofday(&starttime, NULL);
for(t=0; t<asize; t++) {
b[t]=a[t];
}
gettimeofday(&endtime, NULL);
}
Hi,
We test it on AGX Orin developer kit + r35.5.0:
$ mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0 Method: MEMCPY Elapsed: 0.02727 MiB: 256.00000 Copy: 9388.983 MiB/s
1 Method: MEMCPY Elapsed: 0.02720 MiB: 256.00000 Copy: 9412.111 MiB/s
2 Method: MEMCPY Elapsed: 0.02723 MiB: 256.00000 Copy: 9402.431 MiB/s
3 Method: MEMCPY Elapsed: 0.02723 MiB: 256.00000 Copy: 9401.741 MiB/s
AVG Method: MEMCPY Elapsed: 0.02723 MiB: 256.00000 Copy: 9401.309 MiB/s
0 Method: DUMB Elapsed: 0.02721 MiB: 256.00000 Copy: 9408.306 MiB/s
1 Method: DUMB Elapsed: 0.02717 MiB: 256.00000 Copy: 9423.197 MiB/s
2 Method: DUMB Elapsed: 0.02715 MiB: 256.00000 Copy: 9428.750 MiB/s
3 Method: DUMB Elapsed: 0.02716 MiB: 256.00000 Copy: 9425.279 MiB/s
AVG Method: DUMB Elapsed: 0.02717 MiB: 256.00000 Copy: 9421.377 MiB/s
0 Method: MCBLOCK Elapsed: 0.01807 MiB: 256.00000 Copy: 14163.992 MiB/s
1 Method: MCBLOCK Elapsed: 0.01826 MiB: 256.00000 Copy: 14021.251 MiB/s
2 Method: MCBLOCK Elapsed: 0.01783 MiB: 256.00000 Copy: 14360.240 MiB/s
3 Method: MCBLOCK Elapsed: 0.01838 MiB: 256.00000 Copy: 13927.425 MiB/s
AVG Method: MCBLOCK Elapsed: 0.01813 MiB: 256.00000 Copy: 14116.350 MiB/s
Don’t observe the issue. Do you use developer kit or your custom board?
Hi DaneLLL,
I both use our carrier board and developer kit board test. When use download mbw test result is :
$ mbw -t 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 10 runs per test.
0 Method: MEMCPY Elapsed: 0.03726 MiB: 256.00000 Copy: 6871.008 MiB/s
1 Method: MEMCPY Elapsed: 0.03732 MiB: 256.00000 Copy: 6859.409 MiB/s
2 Method: MEMCPY Elapsed: 0.03712 MiB: 256.00000 Copy: 6895.623 MiB/s
3 Method: MEMCPY Elapsed: 0.03724 MiB: 256.00000 Copy: 6873.960 MiB/s
4 Method: MEMCPY Elapsed: 0.03728 MiB: 256.00000 Copy: 6867.874 MiB/s
5 Method: MEMCPY Elapsed: 0.03713 MiB: 256.00000 Copy: 6894.880 MiB/s
6 Method: MEMCPY Elapsed: 0.03718 MiB: 256.00000 Copy: 6885.422 MiB/s
7 Method: MEMCPY Elapsed: 0.03724 MiB: 256.00000 Copy: 6874.513 MiB/s
8 Method: MEMCPY Elapsed: 0.03710 MiB: 256.00000 Copy: 6900.828 MiB/s
9 Method: MEMCPY Elapsed: 0.03715 MiB: 256.00000 Copy: 6890.797 MiB/s
AVG Method: MEMCPY Elapsed: 0.03720 MiB: 256.00000 Copy: 6881.406 MiB/s
0 Method: DUMB Elapsed: 0.03716 MiB: 256.00000 Copy: 6889.684 MiB/s
1 Method: DUMB Elapsed: 0.03725 MiB: 256.00000 Copy: 6872.299 MiB/s
2 Method: DUMB Elapsed: 0.03716 MiB: 256.00000 Copy: 6888.757 MiB/s
3 Method: DUMB Elapsed: 0.03720 MiB: 256.00000 Copy: 6881.720 MiB/s
4 Method: DUMB Elapsed: 0.03711 MiB: 256.00000 Copy: 6898.038 MiB/s
5 Method: DUMB Elapsed: 0.03712 MiB: 256.00000 Copy: 6896.923 MiB/s
6 Method: DUMB Elapsed: 0.03715 MiB: 256.00000 Copy: 6891.539 MiB/s
7 Method: DUMB Elapsed: 0.03716 MiB: 256.00000 Copy: 6889.684 MiB/s
8 Method: DUMB Elapsed: 0.03741 MiB: 256.00000 Copy: 6843.639 MiB/s
9 Method: DUMB Elapsed: 0.03712 MiB: 256.00000 Copy: 6896.923 MiB/s
AVG Method: DUMB Elapsed: 0.03718 MiB: 256.00000 Copy: 6884.885 MiB/s
0 Method: MCBLOCK Elapsed: 0.02421 MiB: 256.00000 Copy: 10574.143 MiB/s
1 Method: MCBLOCK Elapsed: 0.02355 MiB: 256.00000 Copy: 10868.642 MiB/s
2 Method: MCBLOCK Elapsed: 0.02356 MiB: 256.00000 Copy: 10864.030 MiB/s
3 Method: MCBLOCK Elapsed: 0.02338 MiB: 256.00000 Copy: 10950.935 MiB/s
4 Method: MCBLOCK Elapsed: 0.02381 MiB: 256.00000 Copy: 10753.140 MiB/s
5 Method: MCBLOCK Elapsed: 0.02368 MiB: 256.00000 Copy: 10809.898 MiB/s
6 Method: MCBLOCK Elapsed: 0.02389 MiB: 256.00000 Copy: 10716.678 MiB/s
7 Method: MCBLOCK Elapsed: 0.02379 MiB: 256.00000 Copy: 10761.729 MiB/s
8 Method: MCBLOCK Elapsed: 0.02380 MiB: 256.00000 Copy: 10756.303 MiB/s
9 Method: MCBLOCK Elapsed: 0.02391 MiB: 256.00000 Copy: 10706.817 MiB/s
AVG Method: MCBLOCK Elapsed: 0.02376 MiB: 256.00000 Copy: 10775.318 MiB/s
*** stack smashing detected ***: terminated
Aborted (core dumped)
When use mbw compiled from source code the test result is
$ ./mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0 Method: MEMCPY Elapsed: 0.03710 MiB: 256.00000 Copy: 6900.270 MiB/s
1 Method: MEMCPY Elapsed: 0.03719 MiB: 256.00000 Copy: 6884.496 MiB/s
2 Method: MEMCPY Elapsed: 0.03709 MiB: 256.00000 Copy: 6901.572 MiB/s
3 Method: MEMCPY Elapsed: 0.03715 MiB: 256.00000 Copy: 6890.983 MiB/s
AVG Method: MEMCPY Elapsed: 0.03713 MiB: 256.00000 Copy: 6894.323 MiB/s
0 Method: DUMB Elapsed: 0.10219 MiB: 256.00000 Copy: 2505.113 MiB/s
1 Method: DUMB Elapsed: 0.10184 MiB: 256.00000 Copy: 2513.698 MiB/s
2 Method: DUMB Elapsed: 0.10196 MiB: 256.00000 Copy: 2510.862 MiB/s
3 Method: DUMB Elapsed: 0.10183 MiB: 256.00000 Copy: 2513.870 MiB/s
AVG Method: DUMB Elapsed: 0.10196 MiB: 256.00000 Copy: 2510.881 MiB/s
0 Method: MCBLOCK Elapsed: 0.03798 MiB: 256.00000 Copy: 6740.745 MiB/s
1 Method: MCBLOCK Elapsed: 0.03797 MiB: 256.00000 Copy: 6742.875 MiB/s
2 Method: MCBLOCK Elapsed: 0.03790 MiB: 256.00000 Copy: 6754.261 MiB/s
3 Method: MCBLOCK Elapsed: 0.03803 MiB: 256.00000 Copy: 6731.882 MiB/s
AVG Method: MCBLOCK Elapsed: 0.03797 MiB: 256.00000 Copy: 6742.431 MiB/s
I use Production Orin module and use MAXN power mode.
Which power mode do you use? Which Orin module do you use Dev-Kit Module (P3701-0000) or Production 32GB-DRAM (P3701-0004) ?
Hi DaneLLL,
Is there any progress on this issue?
Hi,
We have not reproduced the issue yet. So you think the result of downloaded mbw binary is not correct?
Hi DaneLLL,
Could you use mbw compiled with source code for testing?
Hi,
Please enable the flags in Makefile and give it a try:
CFLAGS=-O2 -Wall -g
Hi DaneLLL,
I enabled “CFLAGS=-O2 -Wall -g” in Makefile the test result is :
ubuntu@ubuntu:~/Downloads/test/mbw-master$ ./mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0 Method: MEMCPY Elapsed: 0.03683 MiB: 256.00000 Copy: 6951.422 MiB/s
1 Method: MEMCPY Elapsed: 0.03690 MiB: 256.00000 Copy: 6937.481 MiB/s
2 Method: MEMCPY Elapsed: 0.03681 MiB: 256.00000 Copy: 6955.199 MiB/s
3 Method: MEMCPY Elapsed: 0.03688 MiB: 256.00000 Copy: 6941.432 MiB/s
AVG Method: MEMCPY Elapsed: 0.03685 MiB: 256.00000 Copy: 6946.376 MiB/s
0 Method: DUMB Elapsed: 0.03678 MiB: 256.00000 Copy: 6959.926 MiB/s
1 Method: DUMB Elapsed: 0.03682 MiB: 256.00000 Copy: 6951.799 MiB/s
2 Method: DUMB Elapsed: 0.03680 MiB: 256.00000 Copy: 6957.278 MiB/s
3 Method: DUMB Elapsed: 0.03678 MiB: 256.00000 Copy: 6960.494 MiB/s
AVG Method: DUMB Elapsed: 0.03680 MiB: 256.00000 Copy: 6957.373 MiB/s
0 Method: MCBLOCK Elapsed: 0.03691 MiB: 256.00000 Copy: 6935.790 MiB/s
1 Method: MCBLOCK Elapsed: 0.03701 MiB: 256.00000 Copy: 6916.676 MiB/s
2 Method: MCBLOCK Elapsed: 0.03693 MiB: 256.00000 Copy: 6932.597 MiB/s
3 Method: MCBLOCK Elapsed: 0.03691 MiB: 256.00000 Copy: 6936.729 MiB/s
AVG Method: MCBLOCK Elapsed: 0.03694 MiB: 256.00000 Copy: 6930.438 MiB/s
The "DUMB " improved form 2513.698 MiB/s to 6957.373 MiB/s , the bandwidth is still lower than the data you tested(9425.279 MiB/s). I suppose where is a difference between Dev-Kit Module (P3701-0000) and Production 32GB-DRAM (P3701-0004) .
Could you use mbw compiled with source code for testing?
Hi,
we do compile mbw with CFLAGS=-O2 -Wall -g
and see ~9400 MiB/s. If you run with prebuilt and self-built mbw, and see identical result, it is throughput of production module.
Hi DaneLLL,
Which power mode do you use? I use 30W mode, and test result is:
ubuntu@ubuntu:~/Downloads/mbw-master$ ./mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0 Method: MEMCPY Elapsed: 0.06479 MiB: 256.00000 Copy: 3951.471 MiB/s
1 Method: MEMCPY Elapsed: 0.06387 MiB: 256.00000 Copy: 4008.393 MiB/s
2 Method: MEMCPY Elapsed: 0.06342 MiB: 256.00000 Copy: 4036.454 MiB/s
3 Method: MEMCPY Elapsed: 0.06336 MiB: 256.00000 Copy: 4040.595 MiB/s
AVG Method: MEMCPY Elapsed: 0.06386 MiB: 256.00000 Copy: 4008.910 MiB/s
0 Method: DUMB Elapsed: 0.06330 MiB: 256.00000 Copy: 4044.042 MiB/s
1 Method: DUMB Elapsed: 0.06359 MiB: 256.00000 Copy: 4025.664 MiB/s
2 Method: DUMB Elapsed: 0.06458 MiB: 256.00000 Copy: 3964.260 MiB/s
3 Method: DUMB Elapsed: 0.06395 MiB: 256.00000 Copy: 4003.190 MiB/s
AVG Method: DUMB Elapsed: 0.06386 MiB: 256.00000 Copy: 4009.067 MiB/s
0 Method: MCBLOCK Elapsed: 0.06429 MiB: 256.00000 Copy: 3982.019 MiB/s
1 Method: MCBLOCK Elapsed: 0.04615 MiB: 256.00000 Copy: 5547.009 MiB/s
2 Method: MCBLOCK Elapsed: 0.03815 MiB: 256.00000 Copy: 6710.706 MiB/s
3 Method: MCBLOCK Elapsed: 0.03824 MiB: 256.00000 Copy: 6694.036 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04671 MiB: 256.00000 Copy: 5480.889 MiB/s
ubuntu@ubuntu:~/Downloads/mbw-master$ mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0 Method: MEMCPY Elapsed: 0.06290 MiB: 256.00000 Copy: 4069.952 MiB/s
1 Method: MEMCPY Elapsed: 0.06284 MiB: 256.00000 Copy: 4073.579 MiB/s
2 Method: MEMCPY Elapsed: 0.06278 MiB: 256.00000 Copy: 4077.862 MiB/s
3 Method: MEMCPY Elapsed: 0.06272 MiB: 256.00000 Copy: 4081.763 MiB/s
AVG Method: MEMCPY Elapsed: 0.06281 MiB: 256.00000 Copy: 4075.784 MiB/s
0 Method: DUMB Elapsed: 0.06268 MiB: 256.00000 Copy: 4083.977 MiB/s
1 Method: DUMB Elapsed: 0.06271 MiB: 256.00000 Copy: 4082.284 MiB/s
2 Method: DUMB Elapsed: 0.06262 MiB: 256.00000 Copy: 4088.412 MiB/s
3 Method: DUMB Elapsed: 0.06264 MiB: 256.00000 Copy: 4086.585 MiB/s
AVG Method: DUMB Elapsed: 0.06266 MiB: 256.00000 Copy: 4085.313 MiB/s
0 Method: MCBLOCK Elapsed: 0.03730 MiB: 256.00000 Copy: 6863.087 MiB/s
1 Method: MCBLOCK Elapsed: 0.03672 MiB: 256.00000 Copy: 6970.918 MiB/s
2 Method: MCBLOCK Elapsed: 0.03655 MiB: 256.00000 Copy: 7004.871 MiB/s
3 Method: MCBLOCK Elapsed: 0.03707 MiB: 256.00000 Copy: 6905.481 MiB/s
AVG Method: MCBLOCK Elapsed: 0.03691 MiB: 256.00000 Copy: 6935.649 MiB/s
kayccc
May 21, 2024, 8:02am
18
Is this still an issue to support? Any result can be shared?
Yes, I need to support. Which power mode do you use?
Hi,
We use the default power mode. Please add CFLAGS=-O2 -Wall -g
while compiling the sample.
Hi
I have enabled “CFLAGS=-O2 -Wall -g” in Makefile the test result is:
ubuntu@ubuntu:~/Downloads/test/mbw-master$ ./mbw -n 4 256
Long uses 8 bytes. Allocating 2*33554432 elements = 536870912 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 4 runs per test.
0 Method: MEMCPY Elapsed: 0.03683 MiB: 256.00000 Copy: 6951.422 MiB/s
1 Method: MEMCPY Elapsed: 0.03690 MiB: 256.00000 Copy: 6937.481 MiB/s
2 Method: MEMCPY Elapsed: 0.03681 MiB: 256.00000 Copy: 6955.199 MiB/s
3 Method: MEMCPY Elapsed: 0.03688 MiB: 256.00000 Copy: 6941.432 MiB/s
AVG Method: MEMCPY Elapsed: 0.03685 MiB: 256.00000 Copy: 6946.376 MiB/s
0 Method: DUMB Elapsed: 0.03678 MiB: 256.00000 Copy: 6959.926 MiB/s
1 Method: DUMB Elapsed: 0.03682 MiB: 256.00000 Copy: 6951.799 MiB/s
2 Method: DUMB Elapsed: 0.03680 MiB: 256.00000 Copy: 6957.278 MiB/s
3 Method: DUMB Elapsed: 0.03678 MiB: 256.00000 Copy: 6960.494 MiB/s
AVG Method: DUMB Elapsed: 0.03680 MiB: 256.00000 Copy: 6957.373 MiB/s
0 Method: MCBLOCK Elapsed: 0.03691 MiB: 256.00000 Copy: 6935.790 MiB/s
1 Method: MCBLOCK Elapsed: 0.03701 MiB: 256.00000 Copy: 6916.676 MiB/s
2 Method: MCBLOCK Elapsed: 0.03693 MiB: 256.00000 Copy: 6932.597 MiB/s
3 Method: MCBLOCK Elapsed: 0.03691 MiB: 256.00000 Copy: 6936.729 MiB/s
AVG Method: MCBLOCK Elapsed: 0.03694 MiB: 256.00000 Copy: 6930.438 MiB/s
Hi,
The result looks good and shall be the throughput of AGX Orin 32GB.
Hi,
Your test result is 9000MiB/s,which module do you use?