NvLink I/O in Pegasus

garywang · May 3, 2019, 5:54am

Hi All,
I saw information from CUDA’s device query as following
…

Peer access from Graphics Device (GPU0) → Xavier (GPU1) : No
Peer access from Xavier (GPU1) → Graphics Device (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 2
…
And check p2pBandwidthLatencyTest p2p enabled didn’t help to increase performance (or due to no providing peer access?!)
…
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 277.76 16.74
1 16.84 96.88
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 278.19 16.89
1 16.87 97.11
…
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 279.42 16.83
1 16.87 82.84
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 279.47 16.82
1 16.85 80.60
…

Anyone help to guide me that Pegasus supports NvLink access in Drive OS?

Thanks!

Gary
deviceQuery.log (4.5 KB)
p2pBandwidthLatencyTest.log (1.45 KB)

SteveNV · May 3, 2019, 6:43am

Dear Garywang,

Could you please try to run bandwidthTest sample using below option for your topic? Thanks.

$CUDA_VISIBLE_DEVICES=0 ./bandwidthTest

SivaRamaKrishnaNV · May 3, 2019, 6:49am

Dear garywang,
P2P calls are not supported on Drive platform.
As CPU and dGPU are connected via NvLink, If you are transferring data from CPU to dGPU, it goes via NVLink. You can check running bandwidth test CUDA sample.

garywang · May 3, 2019, 8:33am

Dear SteveNV,
I list information from my Pegasus for you.
…
CUDA_VISIBLE_DEVICES=0 ./bandwidthTest
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Graphics Device
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 17332.7

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 17320.6

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 272361.8

Result = PASS

CUDA_VISIBLE_DEVICES=1 ./bandwidthTest
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Xavier
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 29250.3

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 29504.7

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 96765.0

Result = PASS
…

Sorry I’m little confused on “Device to Device Bandwidth”, could you help to explain its diffident between CUDA_VISIBLE_DEVICES=1 and CUDA_VISIBLE_DEVICES=0 (i.e. not transfer by NvLINK or not)? Sorry for bothering on it.

Thanks!

Gary

SivaRamaKrishnaNV · May 3, 2019, 8:42am

Dear Garywang,
Device to device bandwidth refers to data transfer bandwidth with in the GPU(from one memory location to other memory location in the same GPU. This does not involve NvLink).
CUDA_VISIBLE_DEVICES is flag is used to select the GPU devices on system. When you set CUDA_VISIBLE_DEVICES=0, you systems behaves as if it has only dGPU and similarly if you set CUDA_VISIBLE_DEVICES=1, it shows only iGPU.

Let us know if you have any other queries

garywang · May 3, 2019, 8:52am

@SivaRamaKrishna,
So, from my point of view to conclude them, is it correct?
for CUDA_VISIBLE_DEVICES=0, Host to Device Bandwidth is 18G/s transferred via NvLink.
for CUDA_VISIBLE_DEVICES=1, Host to Device Bandwidth is 29G/s transferred via Share Memory.

Thanks!

Gary

SivaRamaKrishnaNV · May 3, 2019, 9:08am

Dear Gary,
Setting CUDA_VISIBLE_DEVICES=1, makes only iGPU to be available. Note that, both CPU and iGPU share same DRAM. when you request a memory transfer DMA will take care of transferring data from CPU to GPU. Shared memory is a separate type of memory in CUDA(GPU). For more details on Share memory please refer https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory. For more details about memory architecture on Tegra system, please refer https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#memory-management.

garywang · May 3, 2019, 9:47am

@SivaRamaKrishna
Your guidance is very useful. Thanks!

Gary

Topic		Replies	Views
GPU Device 0: "Graphics Device" on Pegasus does not support Unified Memory DRIVE AGX Xavier General	2	700	October 12, 2021
P2p Bandwidth 150% higher than maximum achievable CUDA Programming and Performance cuda , ubuntu	10	2763	April 11, 2023
Low P2P GPU bandwidth performance between GeForce GPUs CUDA Programming and Performance	20	939	October 9, 2024
Issue with P2P connection using two RTX A4500 CUDA Programming and Performance cuda , ubuntu	7	2484	March 31, 2023
Why there is a memory transfer between two GPUs CUDA Programming and Performance	3	671	September 5, 2020
RTX 3090 + NVLink + CUDA P2P - not working on Linux or Windows, in different ways? CUDA Programming and Performance	9	7541	May 24, 2023
CUDA accessing ALL devices, even those which are blacklisted CUDA Programming and Performance	9	7419	October 17, 2014
P2P: How do I know if cudaMemcpy falls back to non-P2P? CUDA Programming and Performance	8	2400	October 12, 2021
How to collect GPU memory bandwidth usage data with nsys profiler on Pegasus? DRIVE AGX Xavier General drive-devtools	5	2635	October 11, 2021
Device Memory Bandwidth CUDA Programming and Performance	17	8260	January 17, 2018

NvLink I/O in Pegasus

Related topics