P2P access Ada GPUs with PCIe switch

byda · April 21, 2025, 1:54pm

Hello,

I am debugging P2P data access between 2 NVIDIA RTX 5000 Ada Generation Embedded GPUs connected to a single PCIe domain over Microchip PFX Gen4 PCIe switch. It’s a custom board connected to an Intel based motherboard’s PCIe slot.

With NVIDIA 535.230 Linux Graphic driver the following is the output of simpleP2P CUDA sample. It showed the P2P access is available but data verification failed.

$ ./simpleP2P 
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2

Checking GPU(s) for support of peer to peer memory access...
> Peer access from NVIDIA RTX 5000 Ada Generation Embedded GPU (GPU0) -> NVIDIA RTX 5000 Ada Generation Embedded GPU (GPU1) : Yes
> Peer access from NVIDIA RTX 5000 Ada Generation Embedded GPU (GPU1) -> NVIDIA RTX 5000 Ada Generation Embedded GPU (GPU0) : Yes
Enabling peer access between GPU0 and GPU1...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 3.14GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Verification error @ element 1: val = 0.000000, ref = 4.000000
Verification error @ element 2: val = 0.000000, ref = 8.000000
Verification error @ element 3: val = 0.000000, ref = 12.000000
Verification error @ element 4: val = 0.000000, ref = 16.000000
Verification error @ element 5: val = 0.000000, ref = 20.000000
Verification error @ element 6: val = 0.000000, ref = 24.000000
Verification error @ element 7: val = 0.000000, ref = 28.000000
Verification error @ element 8: val = 0.000000, ref = 32.000000
Verification error @ element 9: val = 0.000000, ref = 36.000000
Verification error @ element 10: val = 0.000000, ref = 40.000000
Verification error @ element 11: val = 0.000000, ref = 44.000000
Verification error @ element 12: val = 0.000000, ref = 48.000000
Disabling peer access...
Shutting down...
Test failed!


$ nvidia-smi topo -m
	GPU0	GPU1	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	PIX	0-11	0		N/A
GPU1	PIX	 X 	0-11	0		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks


$ nvidia-smi topo -p2p w
 	GPU0	GPU1	
 GPU0	X	OK	
 GPU1	OK	X	

Legend:

  X    = Self
  OK   = Status Ok
  CNS  = Chipset not supported
  GNS  = GPU not supported
  TNS  = Topology not supported
  NS   = Not supported
  U    = Unknown

However when I updated NVIDIA driver to 570.124 version output of simpleP2P is different

Checking GPU(s) for support of peer to peer memory access...
> Peer access from NVIDIA RTX 5000 Ada Generation Embedded GPU (GPU0) -> NVIDIA RTX 5000 Ada Generation Embedded GPU (GPU1) : No
> Peer access from NVIDIA RTX 5000 Ada Generation Embedded GPU (GPU1) -> NVIDIA RTX 5000 Ada Generation Embedded GPU (GPU0) : No
Two or more GPUs with Peer-to-Peer access capability are required for ./p2p_test.
Peer to Peer access is not available amongst GPUs in the system, waiving test.

$ nvidia-smi topo -m
	GPU0	GPU1	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	PIX	0-11	0		N/A
GPU1	PIX	 X 	0-11	0		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks


$ nvidia-smi topo -p2p w
 	GPU0	GPU1	
 GPU0	X	CNS	
 GPU1	CNS	X	

Legend:

  X    = Self
  OK   = Status Ok
  CNS  = Chipset not supported
  GNS  = GPU not supported
  TNS  = Topology not supported
  NS   = Not supported
  U    = Unknown

Apparently driver v570 better detects P2P access (in)ability.
The question that would help me direct my debug efforts is what in the driver v570 has been changed in comparison to driver v535 that changes the output of cudaDeviceCanAccessPeer()?
I appreciate any info about this CUDA API.

Please note that P2P access works well on this motherboard where 2 P2000 GPUs are installed into PCIe slots so in my mind it is related to either RTX 5000 Ada Generation Embedded GPU or PCIe switch.

byda · April 23, 2025, 3:42pm

I would appreciate any experience of dealing with P2P GPU access related to RTX 5000 Ada Generation Embedded GPU or PCIe switch (Microchip PFX Gen4). Can anyone help?

rs277 · April 23, 2025, 8:11pm

Perhaps a clue in your, “nvidia-smi topo -p2p w” test:

CNS  = Chipset not supported

The PCIe switch?

njuffa · April 23, 2025, 9:24pm

From all the posts from NVIDIA engineers I have ever seen in this forum regarding this topic, NVIDIA does not publicly document or comment on the internal workings of this API call, so the output of the API is the final determination, at the discretion of the driver.

You could file a bug report / feature request with NVIDIA to have your particular chipset / switch supported. I cannot tell which flavor it would be: It is possible the support was removed by accident between version, in which case it would be a bug. It is also possible that NVIDIA determined that, for whatever reason, P2P cannot reliably work with this third party hardware, so the driver refuses to use it on purpose. In which case adding that support would be a feature request.

rs277 · April 24, 2025, 12:46am

Do things work correctly with the RTX5000’s installed directly in motherboard slots?

Edit: Please ignore the previous line, I’ve just realised you’re dealing with embedded GPUs, so presumably they are on the same module as the switch.

byda · April 28, 2025, 3:36pm

Thanks for this valuable insights. I’ll go to figure out how to file a bug/feature request and post here if anything comes out of this.

byda · April 28, 2025, 3:45pm

You right it’s a single board with 2 GPUs and a switch. I can’t find commercial PCIe card with AD5000 I would test with. It’s embedded chip :(

byda · April 28, 2025, 4:07pm

Created this bug/feature request:

rs277 · April 28, 2025, 6:28pm

Is it worth getting one of these to test with, to take the switch out of the equation?

system · May 12, 2025, 6:29pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Failed to simpleP2P CUDA Setup and Installation	3	107	August 26, 2024
[P2P] Device couldn't access another one with p2pBandwidthLatencyTest CUDA Programming and Performance	3	1722	July 14, 2022
P2P: How do I know if cudaMemcpy falls back to non-P2P? CUDA Programming and Performance	8	2389	October 12, 2021
simpleP2P fails on 8*L40S server CUDA Programming and Performance cuda	1	609	January 22, 2024
P2p Bandwidth 150% higher than maximum achievable CUDA Programming and Performance cuda , ubuntu	10	2730	April 11, 2023
Peer-to-Peer Access Fails between 2 GPUs CUDA Setup and Installation	3	5716	July 7, 2017
P2P access not enabled, is this a software or a hardware issue? CUDA Setup and Installation	7	9655	November 10, 2015
CUDA 12.1 SimpleP2P Verification Errors CUDA Setup and Installation	2	61	November 26, 2024
Issue with P2P connection using two RTX A4500 CUDA Programming and Performance cuda , ubuntu	7	2469	March 31, 2023
Peer-to-peer transfer failing on GeForce GTX Titan Z CUDA Programming and Performance	17	3811	April 21, 2015

P2P access Ada GPUs with PCIe switch

Related topics