GPUDirect RDMA on Jetson Orin Nano with RHSResearch Nitefury II

Hi,
As per this project: https://github.com/NVIDIA/jetson-rdma-picoevb (rel-36+ branch),
I have successfully interfaced RHSResearch’s Nitefury II Artix-7 (fpga part - xc7a200tfbg484-2) with Jetson Orin Nano via any of its available m.2 connectors. The board supports PCIe Gen2 x4 and is very similar to picoevb except it has larger fabric and PCIe lanes. One of the m.2 connector downgrades PCIe to x2 while the other supports full x4. The driver also loads without problem. Here’s the lspci -vv output:

0007:01:00.0 Memory controller: NVIDIA Corporation Device 0001
	Subsystem: NVIDIA Corporation Device 0001
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 195
	IOMMU group: 7
	Region 0: Memory at 3228010000 (32-bit, non-prefetchable) [size=4K]
	Region 1: Memory at 3228000000 (32-bit, non-prefetchable) [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s (ok), Width x2 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range B, TimeoutDis- NROPrPrP- LTR-
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Kernel driver in use: picoevb-rdma

The problem is when I try to run rdma-malloc or rdma-cuda, I get lot of misses and jumps in data as shown below:

dst[0xffef8] is 0x3ff017c not 0x3ff02f8
dst[0xffef9] is 0x3ff017d not 0x3ff02f9
dst[0xffefa] is 0x0 not 0x3ff02fa
dst[0xffefb] is 0x0 not 0x3ff02fb
dst[0xffefc] is 0x3ff0180 not 0x3ff02fc
dst[0xffefd] is 0x3ff0181 not 0x3ff02fd
dst[0xffefe] is 0x0 not 0x3ff02fe
dst[0xffeff] is 0x0 not 0x3ff02ff
dst[0xfff00] is 0x3ff0184 not 0x3ff0300
dst[0xfff01] is 0x3ff0185 not 0x3ff0301
dst[0xfff02] is 0x0 not 0x3ff0302
dst[0xfff03] is 0x0 not 0x3ff0303
dst[0xfff04] is 0x3ff0188 not 0x3ff0304
dst[0xfff05] is 0x3ff0189 not 0x3ff0305
dst[0xfff06] is 0x0 not 0x3ff0306
dst[0xfff07] is 0x0 not 0x3ff0307
dst[0xfff08] is 0x3ff018c not 0x3ff0308
dst[0xfff09] is 0x3ff018d not 0x3ff0309
dst[0xfff0a] is 0x0 not 0x3ff030a
dst[0xfff0b] is 0x0 not 0x3ff030b
dst[0xfff0c] is 0x3ff0190 not 0x3ff030c
dst[0xfff0d] is 0x3ff0191 not 0x3ff030d
dst[0xfff0e] is 0x0 not 0x3ff030e
dst[0xfff0f] is 0x0 not 0x3ff030f
dst[0xfff10] is 0x3ff0194 not 0x3ff0310
dst[0xfff11] is 0x3ff0195 not 0x3ff0311
dst[0xfff12] is 0x0 not 0x3ff0312
dst[0xfff13] is 0x0 not 0x3ff0313
dst[0xfff14] is 0x3ff0198 not 0x3ff0314
dst[0xfff15] is 0x3ff0199 not 0x3ff0315
dst[0xfff16] is 0x0 not 0x3ff0316
dst[0xfff17] is 0x0 not 0x3ff0317
dst[0xfff18] is 0x3ff019c not 0x3ff0318
dst[0xfff19] is 0x3ff019d not 0x3ff0319
dst[0xfff1a] is 0x0 not 0x3ff031a
dst[0xfff1b] is 0x0 not 0x3ff031b
dst[0xfff1c] is 0x3ff01a0 not 0x3ff031c
dst[0xfff1d] is 0x3ff01a1 not 0x3ff031d
dst[0xfff1e] is 0x0 not 0x3ff031e
dst[0xfff1f] is 0x0 not 0x3ff031f
dst[0xfff20] is 0x3ff01a4 not 0x3ff0320
dst[0xfff21] is 0x3ff01a5 not 0x3ff0321
dst[0xfff22] is 0x0 not 0x3ff0322
dst[0xfff23] is 0x0 not 0x3ff0323
dst[0xfff24] is 0x3ff01a8 not 0x3ff0324
dst[0xfff25] is 0x3ff01a9 not 0x3ff0325
dst[0xfff26] is 0x0 not 0x3ff0326
dst[0xfff27] is 0x0 not 0x3ff0327
dst[0xfff28] is 0x3ff01ac not 0x3ff0328
dst[0xfff29] is 0x3ff01ad not 0x3ff0329
dst[0xfff2a] is 0x0 not 0x3ff032a
dst[0xfff2b] is 0x0 not 0x3ff032b
dst[0xfff2c] is 0x3ff01b0 not 0x3ff032c
dst[0xfff2d] is 0x3ff01b1 not 0x3ff032d
dst[0xfff2e] is 0x0 not 0x3ff032e
dst[0xfff2f] is 0x0 not 0x3ff032f
dst[0xfff30] is 0x3ff01b4 not 0x3ff0330
dst[0xfff31] is 0x3ff01b5 not 0x3ff0331
dst[0xfff32] is 0x0 not 0x3ff0332
dst[0xfff33] is 0x0 not 0x3ff0333
dst[0xfff34] is 0x3ff01b8 not 0x3ff0334
dst[0xfff35] is 0x3ff01b9 not 0x3ff0335
dst[0xfff36] is 0x0 not 0x3ff0336
dst[0xfff37] is 0x0 not 0x3ff0337
dst[0xfff38] is 0x3ff01bc not 0x3ff0338
dst[0xfff39] is 0x3ff01bd not 0x3ff0339
dst[0xfff3a] is 0x0 not 0x3ff033a
dst[0xfff3b] is 0x0 not 0x3ff033b
dst[0xfff3c] is 0x3ff01c0 not 0x3ff033c
dst[0xfff3d] is 0x3ff01c1 not 0x3ff033d
dst[0xfff3e] is 0x0 not 0x3ff033e
dst[0xfff3f] is 0x0 not 0x3ff033f
dst[0xfff40] is 0x3ff01c4 not 0x3ff0340
dst[0xfff41] is 0x3ff01c5 not 0x3ff0341
dst[0xfff42] is 0x0 not 0x3ff0342
dst[0xfff43] is 0x0 not 0x3ff0343
dst[0xfff44] is 0x3ff01c8 not 0x3ff0344
dst[0xfff45] is 0x3ff01c9 not 0x3ff0345
dst[0xfff46] is 0x0 not 0x3ff0346
dst[0xfff47] is 0x0 not 0x3ff0347
dst[0xfff48] is 0x3ff01cc not 0x3ff0348
dst[0xfff49] is 0x3ff01cd not 0x3ff0349
dst[0xfff4a] is 0x0 not 0x3ff034a
dst[0xfff4b] is 0x0 not 0x3ff034b
dst[0xfff4c] is 0x3ff01d0 not 0x3ff034c
dst[0xfff4d] is 0x3ff01d1 not 0x3ff034d
dst[0xfff4e] is 0x0 not 0x3ff034e
dst[0xfff4f] is 0x0 not 0x3ff034f
dst[0xfff50] is 0x3ff01d4 not 0x3ff0350
dst[0xfff51] is 0x3ff01d5 not 0x3ff0351
dst[0xfff52] is 0x0 not 0x3ff0352
dst[0xfff53] is 0x0 not 0x3ff0353
dst[0xfff54] is 0x3ff01d8 not 0x3ff0354
dst[0xfff55] is 0x3ff01d9 not 0x3ff0355
dst[0xfff56] is 0x0 not 0x3ff0356
dst[0xfff57] is 0x0 not 0x3ff0357
dst[0xfff58] is 0x3ff01dc not 0x3ff0358
dst[0xfff59] is 0x3ff01dd not 0x3ff0359
dst[0xfff5a] is 0x0 not 0x3ff035a
dst[0xfff5b] is 0x0 not 0x3ff035b
dst[0xfff5c] is 0x3ff01e0 not 0x3ff035c
dst[0xfff5d] is 0x3ff01e1 not 0x3ff035d
dst[0xfff5e] is 0x0 not 0x3ff035e
dst[0xfff5f] is 0x0 not 0x3ff035f
dst[0xfff60] is 0x3ff01e4 not 0x3ff0360
dst[0xfff61] is 0x3ff01e5 not 0x3ff0361
dst[0xfff62] is 0x0 not 0x3ff0362
dst[0xfff63] is 0x0 not 0x3ff0363
dst[0xfff64] is 0x3ff01e8 not 0x3ff0364
dst[0xfff65] is 0x3ff01e9 not 0x3ff0365
dst[0xfff66] is 0x0 not 0x3ff0366
dst[0xfff67] is 0x0 not 0x3ff0367
dst[0xfff68] is 0x3ff01ec not 0x3ff0368
dst[0xfff69] is 0x3ff01ed not 0x3ff0369
dst[0xfff6a] is 0x0 not 0x3ff036a
dst[0xfff6b] is 0x0 not 0x3ff036b
dst[0xfff6c] is 0x3ff01f0 not 0x3ff036c
dst[0xfff6d] is 0x3ff01f1 not 0x3ff036d
dst[0xfff6e] is 0x0 not 0x3ff036e
dst[0xfff6f] is 0x0 not 0x3ff036f
dst[0xfff70] is 0x3ff01f4 not 0x3ff0370
dst[0xfff71] is 0x3ff01f5 not 0x3ff0371
dst[0xfff72] is 0x0 not 0x3ff0372
dst[0xfff73] is 0x0 not 0x3ff0373
dst[0xfff74] is 0x3ff01f8 not 0x3ff0374
dst[0xfff75] is 0x3ff01f9 not 0x3ff0375
dst[0xfff76] is 0x0 not 0x3ff0376
dst[0xfff77] is 0x0 not 0x3ff0377
dst[0xfff78] is 0x0 not 0x3ff0378
dst[0xfff79] is 0x0 not 0x3ff0379
dst[0xfff7a] is 0x0 not 0x3ff037a
dst[0xfff7b] is 0x0 not 0x3ff037b
dst[0xfff7c] is 0x0 not 0x3ff037c
dst[0xfff7d] is 0x0 not 0x3ff037d
dst[0xfff7e] is 0x0 not 0x3ff037e
dst[0xfff7f] is 0x0 not 0x3ff037f
dst[0xfff80] is 0x0 not 0x3ff0380
dst[0xfff81] is 0x0 not 0x3ff0381
dst[0xfff82] is 0x0 not 0x3ff0382
dst[0xfff83] is 0x0 not 0x3ff0383
dst[0xfff84] is 0x0 not 0x3ff0384
dst[0xfff85] is 0x0 not 0x3ff0385
dst[0xfff86] is 0x0 not 0x3ff0386
dst[0xfff87] is 0x0 not 0x3ff0387
dst[0xfff88] is 0x0 not 0x3ff0388
dst[0xfff89] is 0x0 not 0x3ff0389
dst[0xfff8a] is 0x0 not 0x3ff038a
dst[0xfff8b] is 0x0 not 0x3ff038b
dst[0xfff8c] is 0x0 not 0x3ff038c
dst[0xfff8d] is 0x0 not 0x3ff038d
dst[0xfff8e] is 0x0 not 0x3ff038e
dst[0xfff8f] is 0x0 not 0x3ff038f
dst[0xfff90] is 0x0 not 0x3ff0390
dst[0xfff91] is 0x0 not 0x3ff0391
dst[0xfff92] is 0x0 not 0x3ff0392
dst[0xfff93] is 0x0 not 0x3ff0393
dst[0xfff94] is 0x0 not 0x3ff0394
dst[0xfff95] is 0x0 not 0x3ff0395
dst[0xfff96] is 0x0 not 0x3ff0396
dst[0xfff97] is 0x0 not 0x3ff0397
dst[0xfff98] is 0x0 not 0x3ff0398
dst[0xfff99] is 0x0 not 0x3ff0399
dst[0xfff9a] is 0x0 not 0x3ff039a
dst[0xfff9b] is 0x0 not 0x3ff039b
dst[0xfff9c] is 0x0 not 0x3ff039c
dst[0xfff9d] is 0x0 not 0x3ff039d
dst[0xfff9e] is 0x0 not 0x3ff039e
dst[0xfff9f] is 0x0 not 0x3ff039f
dst[0xfffa0] is 0x0 not 0x3ff03a0
dst[0xfffa1] is 0x0 not 0x3ff03a1
dst[0xfffa2] is 0x0 not 0x3ff03a2
dst[0xfffa3] is 0x0 not 0x3ff03a3
dst[0xfffa4] is 0x0 not 0x3ff03a4
dst[0xfffa5] is 0x0 not 0x3ff03a5
dst[0xfffa6] is 0x0 not 0x3ff03a6
dst[0xfffa7] is 0x0 not 0x3ff03a7
dst[0xfffa8] is 0x0 not 0x3ff03a8
dst[0xfffa9] is 0x0 not 0x3ff03a9
dst[0xfffaa] is 0x0 not 0x3ff03aa
dst[0xfffab] is 0x0 not 0x3ff03ab
dst[0xfffac] is 0x0 not 0x3ff03ac
dst[0xfffad] is 0x0 not 0x3ff03ad
dst[0xfffae] is 0x0 not 0x3ff03ae
dst[0xfffaf] is 0x0 not 0x3ff03af
dst[0xfffb0] is 0x0 not 0x3ff03b0
dst[0xfffb1] is 0x0 not 0x3ff03b1
dst[0xfffb2] is 0x0 not 0x3ff03b2
dst[0xfffb3] is 0x0 not 0x3ff03b3
dst[0xfffb4] is 0x0 not 0x3ff03b4
dst[0xfffb5] is 0x0 not 0x3ff03b5
dst[0xfffb6] is 0x0 not 0x3ff03b6
dst[0xfffb7] is 0x0 not 0x3ff03b7
dst[0xfffb8] is 0x0 not 0x3ff03b8
dst[0xfffb9] is 0x0 not 0x3ff03b9
dst[0xfffba] is 0x0 not 0x3ff03ba
dst[0xfffbb] is 0x0 not 0x3ff03bb
dst[0xfffbc] is 0x0 not 0x3ff03bc
dst[0xfffbd] is 0x0 not 0x3ff03bd
dst[0xfffbe] is 0x0 not 0x3ff03be
dst[0xfffbf] is 0x0 not 0x3ff03bf
dst[0xfffc0] is 0x0 not 0x3ff03c0
dst[0xfffc1] is 0x0 not 0x3ff03c1
dst[0xfffc2] is 0x0 not 0x3ff03c2
dst[0xfffc3] is 0x0 not 0x3ff03c3
dst[0xfffc4] is 0x0 not 0x3ff03c4
dst[0xfffc5] is 0x0 not 0x3ff03c5
dst[0xfffc6] is 0x0 not 0x3ff03c6
dst[0xfffc7] is 0x0 not 0x3ff03c7
dst[0xfffc8] is 0x0 not 0x3ff03c8
dst[0xfffc9] is 0x0 not 0x3ff03c9
dst[0xfffca] is 0x0 not 0x3ff03ca
dst[0xfffcb] is 0x0 not 0x3ff03cb
dst[0xfffcc] is 0x0 not 0x3ff03cc
dst[0xfffcd] is 0x0 not 0x3ff03cd
dst[0xfffce] is 0x0 not 0x3ff03ce
dst[0xfffcf] is 0x0 not 0x3ff03cf
dst[0xfffd0] is 0x0 not 0x3ff03d0
dst[0xfffd1] is 0x0 not 0x3ff03d1
dst[0xfffd2] is 0x0 not 0x3ff03d2
dst[0xfffd3] is 0x0 not 0x3ff03d3
dst[0xfffd4] is 0x0 not 0x3ff03d4
dst[0xfffd5] is 0x0 not 0x3ff03d5
dst[0xfffd6] is 0x0 not 0x3ff03d6
dst[0xfffd7] is 0x0 not 0x3ff03d7
dst[0xfffd8] is 0x0 not 0x3ff03d8
dst[0xfffd9] is 0x0 not 0x3ff03d9
dst[0xfffda] is 0x0 not 0x3ff03da
dst[0xfffdb] is 0x0 not 0x3ff03db
dst[0xfffdc] is 0x0 not 0x3ff03dc
dst[0xfffdd] is 0x0 not 0x3ff03dd
dst[0xfffde] is 0x0 not 0x3ff03de
dst[0xfffdf] is 0x0 not 0x3ff03df
dst[0xfffe0] is 0x0 not 0x3ff03e0
dst[0xfffe1] is 0x0 not 0x3ff03e1
dst[0xfffe2] is 0x0 not 0x3ff03e2
dst[0xfffe3] is 0x0 not 0x3ff03e3
dst[0xfffe4] is 0x0 not 0x3ff03e4
dst[0xfffe5] is 0x0 not 0x3ff03e5
dst[0xfffe6] is 0x0 not 0x3ff03e6
dst[0xfffe7] is 0x0 not 0x3ff03e7
dst[0xfffe8] is 0x0 not 0x3ff03e8
dst[0xfffe9] is 0x0 not 0x3ff03e9
dst[0xfffea] is 0x0 not 0x3ff03ea
dst[0xfffeb] is 0x0 not 0x3ff03eb
dst[0xfffec] is 0x0 not 0x3ff03ec
dst[0xfffed] is 0x0 not 0x3ff03ed
dst[0xfffee] is 0x0 not 0x3ff03ee
dst[0xfffef] is 0x0 not 0x3ff03ef
dst[0xffff0] is 0x0 not 0x3ff03f0
dst[0xffff1] is 0x0 not 0x3ff03f1
dst[0xffff2] is 0x0 not 0x3ff03f2
dst[0xffff3] is 0x0 not 0x3ff03f3
dst[0xffff4] is 0x0 not 0x3ff03f4
dst[0xffff5] is 0x0 not 0x3ff03f5
dst[0xffff6] is 0x0 not 0x3ff03f6
dst[0xffff7] is 0x0 not 0x3ff03f7
dst[0xffff8] is 0x3ff01fc not 0x3ff03f8
dst[0xffff9] is 0x3ff01fd not 0x3ff03f9
dst[0xffffa] is 0x0 not 0x3ff03fa
dst[0xffffb] is 0x0 not 0x3ff03fb
dst[0xffffc] is 0x3ff0200 not 0x3ff03fc
dst[0xffffd] is 0x3ff0201 not 0x3ff03fd
dst[0xffffe] is 0x0 not 0x3ff03fe
dst[0xfffff] is 0x0 not 0x3ff03ff

My observation is that this long streaks of misses and then bursts are happening periodically. A burst lasts for 128 words (=512bytes) followed by 128 zeroes, and the process repeats. Is this observation related to MaxPayload 512bytes for the DevCap as shown by lspci above.
Note that the only change I made in the picoevb project TCL scripts was to add x4 lanes for the xdma ip. I also commented out the user interrupt from GPIO as that wasn’t being used for anything. The set_leds code ran fine and I could see the leds flip.

What could I do to make this DMA thing work?
I am running Jetpack version 6.1+b123 installed via SDK Manager 2.2.0.12021 x86_64 on Ubuntu 22.04 machine.
Also attaching the reports & logs.
lspci_report.txt (2.4 KB)
rdma-cuda.txt (9.2 KB)
rdma-malloc.txt (9.2 KB)

I was able to resolve this. I had to change the xdma config setting of AXI data width from 128 bit (default) to 64 bit. This change would also set the AXI Clock Frequency to 250MHz. With these changes, the test applications pass for both malloc and cuda.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.