NvLink I/O in Pegasus

Hi All,
I saw information from CUDA’s device query as following

Peer access from Graphics Device (GPU0) → Xavier (GPU1) : No
Peer access from Xavier (GPU1) → Graphics Device (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 2

And check p2pBandwidthLatencyTest p2p enabled didn’t help to increase performance (or due to no providing peer access?!)

Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 277.76 16.74
1 16.84 96.88
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 278.19 16.89
1 16.87 97.11

Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 279.42 16.83
1 16.87 82.84
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 279.47 16.82
1 16.85 80.60

Anyone help to guide me that Pegasus supports NvLink access in Drive OS?

Thanks!

Gary
deviceQuery.log (4.5 KB)
p2pBandwidthLatencyTest.log (1.45 KB)

Dear Garywang,

Could you please try to run bandwidthTest sample using below option for your topic? Thanks.

$CUDA_VISIBLE_DEVICES=0 ./bandwidthTest

Dear garywang,
P2P calls are not supported on Drive platform.
As CPU and dGPU are connected via NvLink, If you are transferring data from CPU to dGPU, it goes via NVLink. You can check running bandwidth test CUDA sample.

Dear SteveNV,
I list information from my Pegasus for you.

CUDA_VISIBLE_DEVICES=0 ./bandwidthTest
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Graphics Device
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 17332.7

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 17320.6

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 272361.8

Result = PASS

CUDA_VISIBLE_DEVICES=1 ./bandwidthTest
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Xavier
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 29250.3

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 29504.7

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 96765.0

Result = PASS

Sorry I’m little confused on “Device to Device Bandwidth”, could you help to explain its diffident between CUDA_VISIBLE_DEVICES=1 and CUDA_VISIBLE_DEVICES=0 (i.e. not transfer by NvLINK or not)? Sorry for bothering on it.

Thanks!

Gary

Dear Garywang,
Device to device bandwidth refers to data transfer bandwidth with in the GPU(from one memory location to other memory location in the same GPU. This does not involve NvLink).
CUDA_VISIBLE_DEVICES is flag is used to select the GPU devices on system. When you set CUDA_VISIBLE_DEVICES=0, you systems behaves as if it has only dGPU and similarly if you set CUDA_VISIBLE_DEVICES=1, it shows only iGPU.

Let us know if you have any other queries

@SivaRamaKrishna,
So, from my point of view to conclude them, is it correct?
for CUDA_VISIBLE_DEVICES=0, Host to Device Bandwidth is 18G/s transferred via NvLink.
for CUDA_VISIBLE_DEVICES=1, Host to Device Bandwidth is 29G/s transferred via Share Memory.

Thanks!

Gary

Dear Gary,
Setting CUDA_VISIBLE_DEVICES=1, makes only iGPU to be available. Note that, both CPU and iGPU share same DRAM. when you request a memory transfer DMA will take care of transferring data from CPU to GPU. Shared memory is a separate type of memory in CUDA(GPU). For more details on Share memory please refer https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory. For more details about memory architecture on Tegra system, please refer https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#memory-management.

@SivaRamaKrishna
Your guidance is very useful. Thanks!

Gary