Xavier Memory Bandwidth on Pegasus

In this page (https://developer.nvidia.com/drive/drive-agx), NVIDIA claimed that the Xavier Memory Bandwidth is > 250GB/s. However, I run mbw commands on Xavier and the bandwidth returned are only as much as 12 GB/s.

nvidia@pegasus2b:~$ mbw 1B
Long uses 8 bytes. Allocating 2*131072 elements = 2097152 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business… Doing 10 runs per test.
0 Method: MEMCPY Elapsed: 0.00017 MiB: 1.00000 Copy: 5882.353 MiB/s
1 Method: MEMCPY Elapsed: 0.00010 MiB: 1.00000 Copy: 10000.000 MiB/s
2 Method: MEMCPY Elapsed: 0.00016 MiB: 1.00000 Copy: 6410.256 MiB/s
3 Method: MEMCPY Elapsed: 0.00011 MiB: 1.00000 Copy: 9433.962 MiB/s
4 Method: MEMCPY Elapsed: 0.00013 MiB: 1.00000 Copy: 7575.758 MiB/s
5 Method: MEMCPY Elapsed: 0.00016 MiB: 1.00000 Copy: 6369.427 MiB/s
6 Method: MEMCPY Elapsed: 0.00013 MiB: 1.00000 Copy: 7874.016 MiB/s
7 Method: MEMCPY Elapsed: 0.00015 MiB: 1.00000 Copy: 6896.552 MiB/s
8 Method: MEMCPY Elapsed: 0.00012 MiB: 1.00000 Copy: 8333.333 MiB/s
9 Method: MEMCPY Elapsed: 0.00012 MiB: 1.00000 Copy: 8264.463 MiB/s
AVG Method: MEMCPY Elapsed: 0.00013 MiB: 1.00000 Copy: 7496.252 MiB/s
0 Method: DUMB Elapsed: 0.00014 MiB: 1.00000 Copy: 7407.407 MiB/s
1 Method: DUMB Elapsed: 0.00015 MiB: 1.00000 Copy: 6849.315 MiB/s
2 Method: DUMB Elapsed: 0.00016 MiB: 1.00000 Copy: 6289.308 MiB/s
3 Method: DUMB Elapsed: 0.00011 MiB: 1.00000 Copy: 9090.909 MiB/s
4 Method: DUMB Elapsed: 0.00010 MiB: 1.00000 Copy: 10526.316 MiB/s
5 Method: DUMB Elapsed: 0.00011 MiB: 1.00000 Copy: 8849.558 MiB/s
6 Method: DUMB Elapsed: 0.00011 MiB: 1.00000 Copy: 9259.259 MiB/s
7 Method: DUMB Elapsed: 0.00011 MiB: 1.00000 Copy: 9174.312 MiB/s
8 Method: DUMB Elapsed: 0.00012 MiB: 1.00000 Copy: 8264.463 MiB/s
9 Method: DUMB Elapsed: 0.00015 MiB: 1.00000 Copy: 6493.506 MiB/s
AVG Method: DUMB Elapsed: 0.00013 MiB: 1.00000 Copy: 8000.000 MiB/s
0 Method: MCBLOCK Elapsed: 0.00011 MiB: 1.00000 Copy: 9433.962 MiB/s
1 Method: MCBLOCK Elapsed: 0.00011 MiB: 1.00000 Copy: 9259.259 MiB/s
2 Method: MCBLOCK Elapsed: 0.00010 MiB: 1.00000 Copy: 10309.278 MiB/s
3 Method: MCBLOCK Elapsed: 0.00011 MiB: 1.00000 Copy: 8849.558 MiB/s
4 Method: MCBLOCK Elapsed: 0.00011 MiB: 1.00000 Copy: 9009.009 MiB/s
5 Method: MCBLOCK Elapsed: 0.00011 MiB: 1.00000 Copy: 9090.909 MiB/s
6 Method: MCBLOCK Elapsed: 0.00010 MiB: 1.00000 Copy: 10309.278 MiB/s
7 Method: MCBLOCK Elapsed: 0.00011 MiB: 1.00000 Copy: 9090.909 MiB/s
8 Method: MCBLOCK Elapsed: 0.00010 MiB: 1.00000 Copy: 10526.316 MiB/s
9 Method: MCBLOCK Elapsed: 0.00008 MiB: 1.00000 Copy: 12195.122 MiB/s
AVG Method: MCBLOCK Elapsed: 0.00010 MiB: 1.00000 Copy: 9718.173 MiB/s
Pegasus-memory-spec.png

Dear verySample,
Could you check running cuda memory bandwith sample on iGPU. You can make only iGPU as active by setting env variable CUDA_VISIBLE_DEVICES=1 and run CUDA bandwidth sample in /usr/local/cuda/samples/1_utilities

Thanks for prompt response. However, I am talking about CPU memory bandwidth, not the GPU bandwidth.

Dear VerySimple,
Thank you pointing out this. It is a documentation bug and will update the page. You expected to get ~100GB/s when you use pinned memory. Please check running CUDA bandwidth memory to know memory transfer timings.

Hi SivaRamaKrishna

I run the bandwidthTest under sample/1_utilities directory. Below is the result I got.

The result is far away from the designed bandwidth.

[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Xavier
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 14587.7

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 15007.7

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 27326.4

Result = PASS

Hi hsienyan.gan,

The documentation on the developer site has been updated to be more clear. The developer site is the theoretical bandwidth, not measured with overhead.

When you run the bandwidth test, are you testing with the latest Drive SW 10 and pinned memory?

nvidia@tegra-ubuntu:~$ ./bandwidthTest --memory=pinned
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Xavier
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 31.7

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 31.6

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 100.5

Yes,it is pinned memory.I ran on Qnx platform,it could only hit 27GB/s.
But in Linux Os ,it can archive only when running on full performance mode (nvpmodel -m 0).

I could not find a similar command on QNX to set its power mode.

Dear hsieyan.gan,
Note that the number we are reporting are on linux based platform.
For any QNX platform related issues could you please file a bug as we do not support QNX platform related issues via forum.