PCIe 2.0 on TX2 Not Meeting Specifications!

Greetings,

I’m working on a team that is considering the TX2 for one of our products. We’re using a Xilinx FPGA Development Board, the AC701, to stream data over the PCIe interface on the TX2 carrier board into the TX2. Unfortunately we’re getting less than half the specified data rate for PCIe 2.0 with 4x lanes.

We’ve transferred data using the Memory-Mapped and Streaming methods on both a Desktop PC running Ubuntu and on the TX2. In both cases everything is exactly identical with the driver, FPGA and set up.

Here is a chart comparing performance with the TX2 and a Desktop PC:
https://flic.kr/p/22J2Htv

Yes, we have run the ./jetson_clocks.sh script and have set the “nvpmode -m 0”, so that is not the issue.

Additionally, here is the output to lspci -vvvv

00:01.0 PCI bridge: NVIDIA Corporation Device 10e5 (rev a1) (prog-if 00 [Normal decode])
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 388
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: 50100000-501fffff
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
                Mapping Address Base: 00000000fee00000
        Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Off, PwrInd On, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Kernel driver in use: pcieport

01:00.0 Serial controller: Xilinx Corporation Device 7024 (prog-if 01 [16450])
        Subsystem: Xilinx Corporation Device 0007
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 388
        Region 0: Memory at 50100000 (32-bit, non-prefetchable) 
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis-, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00

We’ve looked through the forum and found a lot of people who have posted similar results and haven’t gotten any help or response from NVidia, so we’re starting to suspect that NVidia embellished the TX2’s PCIe capabilities. Perhaps it’s protocol-compliant with PCIe 2.0, but it certainly doesn’t reach the specified 5 GT/s per lane as listed in your documentation.
https://devtalk.nvidia.com/default/topic/1026623/jetson-tx2/pcie-x4-only-658mb-s/

We’d appreciate any support you can give us. The documentation for the TX2 is extremely lacking when it comes it PCIe and how it is handled internally, so we are unable to troubleshoot this further ourselves. We’re considering dropping the TX2 as a platform for our products if we can’t reach the required data rate.

Thanks in advance for your help.

What is the jetpack release you are using and have you tried disabling SMMU for PCIe?

Vidyas,

thanks for the reply. We’re using Jetpack 3.1 and Linux4Tegra 28.1.

We saw that you recommended disabling the SMMU to other people on the forum, but it was never actually explained how that is accomplished. We’re assuming you mean we have to:

sudo dtc -I fs -O dts -o extracted_proc.dts /proc/device-tree

to decompile the device tree to its source file, then open it up and remove:

  1. #stream-id-cells = <1>;” from “tegra_pcie” node
  2. “<&{/pcie-controller@10003000} TEGRA_SID_AFI>,” from “smmu” node

Then recompile like:

sudo dtc -I dts -O dtb -o tegra186-quill-p3310-1000-c03-00-base.dtb extracted_proc.dts

However, in our device tree source file there are no such lines. Additionally, when we examine /proc/device-tree (linked to /sys/firmware/devicetree/base/) we get:

actmon@d230000           bpmp_i2c                     generic-system-config         mods-simple-bus           pwm-fan               tachometer@39c0000
#address-cells           bthrot_cdev                  gp10b                         mttcan0-ivc               replicator@0x8040000  tegra186-pm-irq
adma@2930000             bwmgr                        gpio@2200000                  mttcan1-ivc               reserved-memory       tegra-aon-ivc-echo
adsp@2993000             chipid@100000                gpio@c2f0000                  mttcan@c310000            roc-flush@e080000     tegra-camera-platform
adsp_audio               chosen                       gpio-keys                     mttcan@c320000            rtc@c2a0000           tegra-carveouts
agic-controller@2a41000  clock@5000000                gps_wake                      name                      rtcpu@2993000         tegra_cec
ahci-sata@3507000        clocks                       hardwood                      nvdumper                  rtcpu@b000000         tegra_fiq_debugger
ahub                     cluster_clk_priv@e090000     hda@3510000                   nvidia,boardids           sce@b000000           tegra-firmwares
aliases                  combined-uart                host1x                        nvidia,dtbbuildtime       sce-ivc-channels      tegra-hsp@29a0000
aon@c160000              compatible                   hsp_top                       nvidia,dtsfilename        sdhci@3400000         tegra-hsp@3c00000
aondbg                   cpufreq@e070000              i2c@3160000                   nvidia,proc-boardid       sdhci@3420000         tegra-hsp@b150000
aon_spi@c260000          cpuidle                      i2c@3180000                   pcie-controller@10003000  sdhci@3440000         tegra-hsp@c150000
ape-ivc-channels         cpus                         i2c@3190000                   pfsd                      sdhci@3460000         tegra-mce
arm-pmu                  csi_mipical                  i2c@31a0000                   pinctrl@3520000           se_elp@3ad0000        tegra-pmc-blink-pwm
axi2apb@2390000          denver-pmu                   i2c@31b0000                   pinmux@2430000            serial@3100000        tegra-rtcpu-trace
axi2apb@23a0000          dma@2600000                  i2c@31c0000                   plugin-manager            serial@3110000        tegra_safety_ivc
axi2apb@23b0000          dpaux0                       i2c@31e0000                   pmc@c360000               serial@3130000        tegra-serr
axi2apb@23c0000          dpaux1                       i2c@c240000                   pmc@c370000               serial@3140000        tegra-virtual-camera-platform
axi2apb@23d0000          dummy-cool-dev               i2c@c250000                   pmc-iopower               serial@3150000        tfesd
axip2p@2100000           e3326_lens_ov5693@P5V27C     interrupt-controller          power-domain              serial@c280000        thermal-fan-est
axip2p@2110000           e3333_lens_ov5693@P5V27C     interrupt-controller@3000000  psci                      serial@c290000        thermal-zones
axip2p@2120000           eeprom-manager               interrupt-controller@3881000  ptm@9840000               serial-number         timer
axip2p@2130000           efuse@3820000                interrupt-parent              ptm@9940000               #size-cells           timer@3020000
axip2p@2140000           eqos_ape@2990000             iommu@12000000                ptm@9a40000               smmu_test             tpiu@8060000
axip2p@2150000           etf@8030000                  kfuse@0x3830000               ptm@9b40000               soft_watchdog         trusty
axip2p@2160000           ether_qos@2490000            lens_imx274@A6V26             ptm_bpmp@8a1c000          sound                 ufshci@2450000
axip2p@2170000           ether_qos_virt_test@2490000  mailbox@3538000               pwm@3280000               sound_ref             usb_cd
axip2p@2180000           etr@8050000                  max16984-cdp                  pwm@3290000               spdif_dit             vi-bypass@15700000
axip2p@2190000           external-connection          mc                            pwm@32a0000               spi@3210000           vivid-driver
backlight                firmware                     mc_sid@2c00000                pwm@32c0000               spi@3230000           watchdog@30c0000
bcmdhd_pcie_wlan         fixed-regulators             memory@80000000               pwm@32d0000               spi@3240000           xhci@3530000
bcmdhd_wlan              funnel_bccplex@9010000       mipical                       pwm@32e0000               spi@3270000           xotg
bluedroid_pm             funnel_major@8010000         miscreg@00100000              pwm@32f0000               spi@c260000           xudc@3550000
bpmp                     funnel_minor@8820000         model                         pwm@c340000               stm@8070000

The nodes tegra_pcie or smmu are nowhere to be found. If this is not the proper procedure, can you please explain to us in detail how the SMMU can be disabled for PCIe? If this is the proper procedure, can you explain why we don’t have these nodes in our device tree, even though PCIe works (albeit slowly). Thanks in advance.

Additionally, the part of the extracted device-tree source describing the pcie-controller@10003000 is as follows:

pcie-controller@10003000 {
		compatible = "nvidia,tegra186-pcie";
		power-domains = <0xcd>;
		device_type = "pci";
		reg = <0x0 0x10003000 0x0 0x800 0x0 0x10003800 0x0 0x800 0x0 0x40000000 0x0 0x10000000>;
		reg-names = "pads", "afi", "cs";
		clocks = <0xd 0x4 0xd 0x3 0xd 0x261>;
		clock-names = "afi", "pcie", "clk_m";
		resets = <0xd 0x1 0xd 0x1d 0xd 0x1e>;
		reset-names = "afi", "pcie", "pciex";
		interrupts = <0x0 0x48 0x4 0x0 0x49 0x4>;
		interrupt-names = "intr", "msi";
		#interrupt-cells = <0x1>;
		interrupt-map-mask = <0x0 0x0 0x0 0x0>;
		interrupt-map = <0x0 0x0 0x0 0x0 0x1 0x0 0x48 0x4>;
		#stream-id-cells = <0x1>;
		bus-range = <0x0 0xff>;
		#address-cells = <0x3>;
		#size-cells = <0x2>;
		ranges = <0x82000000 0x0 0x10000000 0x0 0x10000000 0x0 0x1000 0x82000000 0x0 0x10001000 0x0 0x10001000 0x0 0x1000 0x82000000 0x0 0x10004000 0x0 0x10004000 0x0 0x1000 0x81000000 0x0 0x0 0x0 0x50000000 0x0 0x10000 0x82000000 0x0 0x50100000 0x0 0x50100000 0x0 0x7f00000 0xc2000000 0x0 0x58000000 0x0 0x58000000 0x0 0x28000000>;
		status = "okay";
		vddio-pexctl-aud-supply = <0xe>;
		linux,phandle = <0x79>;
		phandle = <0x79>;

		pci@1,0 {
			device_type = "pci";
			assigned-addresses = <0x82000800 0x0 0x10000000 0x0 0x1000>;
			reg = <0x800 0x0 0x0 0x0 0x0>;
			status = "okay";
			#address-cells = <0x3>;
			#size-cells = <0x2>;
			ranges;
			nvidia,num-lanes = <0x2>;
			nvidia,afi-ctl-offset = <0x110>;
		};

		pci@2,0 {
			device_type = "pci";
			assigned-addresses = <0x82001000 0x0 0x10001000 0x0 0x1000>;
			reg = <0x1000 0x0 0x0 0x0 0x0>;
			status = "disabled";
			#address-cells = <0x3>;
			#size-cells = <0x2>;
			ranges;
			nvidia,num-lanes = <0x1>;
			nvidia,afi-ctl-offset = <0x118>;
		};

		pci@3,0 {
			device_type = "pci";
			assigned-addresses = <0x82001800 0x0 0x10004000 0x0 0x1000>;
			reg = <0x1800 0x0 0x0 0x0 0x0>;
			status = "okay";
			#address-cells = <0x3>;
			#size-cells = <0x2>;
			ranges;
			nvidia,num-lanes = <0x1>;
			nvidia,afi-ctl-offset = <0x19c>;
		};

		prod-settings {
			#prod-cells = <0x3>;

			prod_c_pad {
				prod = <0xc8 0xffffffff 0x80b880b8 0xcc 0xffffffff 0x480b8>;
			};
		};
	};

To disable SMMU for PCIe, please remove following entry from dts and recompile it back to dtb

#stream-id-cells = <0x1>;"

quick way to check whether or not SMMU is enabled for PCIe is to go to “/sys/kernel/debug/12000000.iommu/masters” folder and see if there are any PCIe related entries there.
BTW, Tegra’s PCIe is expected to give around 12 ~ 13 Gbps usable bandwidth (after taking out 8b/10b, Acks, update FCs, TLP headers Etc… protocol overheads)
Having SMMU enabled for PCIe would play a role in reducing effective bandwidth if there are too many map/unmap calls happening during data transfers (this typically happens in case of network interface cards), but otherwise it should be possible to get around 12 ~ 13 Gbps throughput.
Can you give some idea about the kind of use case we have here and approximately how frequently we map/unmap buffers?

Hi,

i´m working with WBLee on that issue. He´s not at the office today, so i tried to disable SMMU.

I removed “#stream-id-cells = <0x1>;” from the pcie-controller@10003000 node.
After that, no pcie entries are in “/sys/kernel/debug/12000000.iommu/masters”

drwxr-xr-x 36 root root 0 Jan  1  1970 .
drwxr-xr-x 23 root root 0 Jan  1  1970 ..
drwxr-xr-x  2 root root 0 Jan  1  1970 13e10000.host1x
drwxr-xr-x  2 root root 0 Jan  1  1970 13e10000.host1x:ctx0
drwxr-xr-x  2 root root 0 Jan  1  1970 13e10000.host1x:ctx1
drwxr-xr-x  2 root root 0 Jan  1  1970 13e10000.host1x:ctx2
drwxr-xr-x  2 root root 0 Jan  1  1970 13e10000.host1x:ctx3
drwxr-xr-x  2 root root 0 Jan  1  1970 13e10000.host1x:ctx4
drwxr-xr-x  2 root root 0 Jan  1  1970 13e10000.host1x:ctx5
drwxr-xr-x  2 root root 0 Jan  1  1970 13e10000.host1x:ctx6
drwxr-xr-x  2 root root 0 Jan  1  1970 13e10000.host1x:ctx7
drwxr-xr-x  2 root root 0 Jan  1  1970 150c0000.nvcsi
drwxr-xr-x  2 root root 0 Jan  1  1970 15100000.tsecb
drwxr-xr-x  2 root root 0 Jan  1  1970 15210000.nvdisplay
drwxr-xr-x  2 root root 0 Jan  1  1970 15340000.vic
drwxr-xr-x  2 root root 0 Jan  1  1970 15380000.nvjpg
drwxr-xr-x  2 root root 0 Jan  1  1970 15480000.nvdec
drwxr-xr-x  2 root root 0 Jan  1  1970 154c0000.nvenc
drwxr-xr-x  2 root root 0 Jan  1  1970 15500000.tsec
drwxr-xr-x  2 root root 0 Jan  1  1970 15600000.isp
drwxr-xr-x  2 root root 0 Jan  1  1970 15700000.vi
drwxr-xr-x  2 root root 0 Jan  1  1970 15810000.se
drwxr-xr-x  2 root root 0 Jan  1  1970 15820000.se
drwxr-xr-x  2 root root 0 Jan  1  1970 15830000.se
drwxr-xr-x  2 root root 0 Jan  1  1970 15840000.se
drwxr-xr-x  2 root root 0 Jan  1  1970 17000000.gp10b
drwxr-xr-x  2 root root 0 Jan  1  1970 2490000.ether_qos
drwxr-xr-x  2 root root 0 Jan  1  1970 2993000.adsp
drwxr-xr-x  2 root root 0 Jan  1  1970 3400000.sdhci
drwxr-xr-x  2 root root 0 Jan  1  1970 3460000.sdhci
drwxr-xr-x  2 root root 0 Jan  1  1970 3510000.hda
drwxr-xr-x  2 root root 0 Jan  1  1970 3530000.xhci
drwxr-xr-x  2 root root 0 Jan  1  1970 3550000.xudc
drwxr-xr-x  2 root root 0 Jan  1  1970 adsp_audio
drwxr-xr-x  2 root root 0 Jan  1  1970 smmu_test
drwxr-xr-x  2 root root 0 Jan  1  1970 sound

But now i can´t execute ./jetson_clocks.sh without an error and the cpu doesn´t clock to nvpmodel -m 0

root@~# ./jetson_clocks.sh
./jetson_clocks.sh: line 299: /sys/kernel/debug/bpmp/debug/clk/emc/rate: No such file or directory
./jetson_clocks.sh: line 307: /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked: No such file or directory

The speed is still 351 Mbyte/s @ 64 Kbytes.
But the other effect is, that our Xilinx Driver is now stable!

See:

I did another test yesterday!
I removed this from the device tree file (didn´t remove “#stream-id-cells = <0x1>” from pcie-controller@10003000 node):

smmu_test_domain {
	sid-list = <0x33>;
	address-space = <0x94>;
	sid-list-len = <0x1>;
};

After that, i can excute ./jetson_clocks.sh without errors, our driver is stable and the speed is
a little bit better. ~690 Mbyte/s @ 64 Kbytes. It`s still not the expected speed, but it´s better.
Can you explain what the “smmu_test_domain” does and how we can increase the speed level?

We´re using the xilinx dma driver (https://www.xilinx.com/support/answers/65444.html)
This driver creates a device file (“/dev/xdma0_c2h_0”). In our test program we read 65536 bytes from this device.
We measure the read time to get the speed.

int rc;
  char *buffer = NULL;
  char *allocated = NULL;
  struct timespec ts_start, ts_end;
  uint32_t address = 0;
  uint32_t size = 65536;
  uint32_t offset = 0;
  
  posix_memalign((void **)&allocated, 4096/*alignment*/, size + 4096);
  assert(allocated);
  buffer = allocated + offset;

  int fpga_fd = open("/dev/xdma0_c2h_0", O_RDWR | O_NONBLOCK);

  memset(buffer, 0x00, size);
  off_t off = lseek(fpga_fd, address, SEEK_SET);
  
  rc = clock_gettime(CLOCK_MONOTONIC, &ts_start);
  rc = read(fpga_fd, buffer, size);
  rc = clock_gettime(CLOCK_MONOTONIC, &ts_end);

  timespec_sub(&ts_end, &ts_start);

  qDebug() << "Data rate [MByte/s]: " << (float)size/((float)ts_end.tv_nsec*1.0e-9f*1024.0f*1024.0f);

  close(fpga_fd);
  free(allocated);

As another reference point, we are getting about 180MB/s @256KB. We are using the TX2 with the ZC706 connected in x4 PCIe, using the same XDMA Linux driver as you. We have been using Jetpack 3.0, with driver and DMA for PCIe core configured in streaming.

We hit the same issue as you where the driver appears to crash but only with larger transfers (>128K). Xlinix DMA PCIe driver crashes - Jetson TX2 - NVIDIA Developer Forums

We’ll try this out and see if disabling SMMU fixes that problem for Jetpack 3.0 as well. It’s great to know that someone has gotten this configuration working.

We have tried the fixes you describe above, and still have same data rates (180MB/s) and still get driver hangs on any large transfers (>2MB). Can you share your linux and xilinx source files that get you the stable large transfers?

Hi,rconraduw
I use xilinx XDMA and does not disable iommu for pcie. When FPGA transfer 1 byte or any other larger data to TX2,TX2 system was crashed.Do you have any ideal?

I found a function cause the crash.
I use the driver version 2018.3.41,and in the function “engine_service_wb_monitor”, it call “schedule()” system function.Sometimes it work fine,but sometimes it cause crash.
Who knows why?