Xavier in EP mode(shared memory), the default size is [size=64K], but it can only be used 4KB

Hi I’m now using the Xavier in shared memory (EP mode), with ubuntu20 x64 PC(root port),

from AGX(EP) I got the Shared Memory address start from 0x13ecae000
from PC(RP) I got the Shared Memory address start from 0xfb300000

I wrote the data 0x12345678 in AGX(EP) memory address 0x13ecaefff
but I got the data 0xFFFFFF78 from PC(RP) memory address 0xfb300fff, which is different
That means I can only use 4Kbytes. How can I use the default size (64KB)?

Here is additional information:

EP:
dmesg:

[ 20.611734] pci_epf_nv_test pci_epf_nv_test.0: BAR0 RAM phys: 0x13ecae000
[ 20.611748] pci_epf_nv_test pci_epf_nv_test.0: BAR0 RAM IOVA: 0xffff0000
[ 20.611775] pci_epf_nv_test pci_epf_nv_test.0: BAR0 RAM virt: 0x00000000de83e982

$ busybox devmem 0x13ecaefff 32
0x00000008
$ busybox devmem 0x13ecaefff 32 0x12345678
$ busybox devmem 0x13ecaefff 32
0x12345678

Root Port (x64 / ubuntu20.04)

$ busybox devmem 0xfb300fff 32
0xFFFFFF78

$ lspci -s 05:00.0 -vv
05:00.0 RAM memory: NVIDIA Corporation Device 0001
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 11
NUMA node: 0
Region 0: Memory at fb300000 (32-bit, non-prefetchable) [size=64K]
Region 2: Memory at d2100000 (64-bit, prefetchable) [size=128K]
Region 4: Memory at fb200000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM not supported
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (downgraded), Width x8 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-, TPHComp-, ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00010000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [148 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Capabilities: [168 v1] Physical Layer 16.0 GT/s <?> Capabilities: [190 v1] Lane Margining at the Receiver <?>
Capabilities: [1b8 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [1c0 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=60us PortTPowerOnTime=40us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us
L1SubCtl2: T_PwrOn=10us
Capabilities: [1d0 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> Capabilities: [2d0 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [308 v1] Data Link Feature <?> Capabilities: [314 v1] Precision Time Measurement PTMCap: Requester:+ Responder:- Root:- PTMClockGranularity: Unimplemented PTMControl: Enabled:- RootSelected:- PTMEffectiveGranularity: Unknown Capabilities: [320 v1] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>

What would you get if you read 0xfb300000?

AGX address (0x13ecae000 ~ 0x13ecaefff) is correctly mapping to PC address (0xfb300000 to 0xfb300fff), it’s only 4KB.
If the data is out of this range(AGX address over 0x13ecaefff, PC address over 0xfb300fff ), although I could write it in AGX, but PC cannot read the correct data.

Hi,

Ideally we shouldn’t use devmem on the PHY address from DRAM
Please modify the data in EP BAR0 using virtual address(epfnv->bar0_ram_map) in the driver and then read out using BAR0 address in RP.

Some details in this post.