CUDA sample about bandwithTest

I have a question regarding to memory throughput of the Xavier AGX.
Xavier is in the highest performance mode (nvpmodel –m 0 jetson_clocks)
CUDA sample
/usr/local/cuda-10.2/samples/1_Utilities/bandwithTest/

In the specs it is written that the Xavier peek memory throughput is 137 GB/s and device memory, host memory, and unified memory are allocated on the same physical SoC DRAM.

Why is the bandwidth from device to host(host to device)only 30% of the standard value?

Device 0: Xavier
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 37.5

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 37.3

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 108.1

Result = PASS

Hello

The peak memory is shared between all engines on the chip. and there is some firmware that limits how much each chip can take. Also peak memory is just bus width * memory clock frequency and it is not always attainable in practice.