PCIE DMA speed is asymmetrical between ep_to_rc and rc_to_ep

I used PCIE EP DMA. The speed of Ep to Rc is about two times the speed of Rc to Ep. And I change the DMA channel for test, but the problem is always that. How can I fix it? thanks

There are two issues here:

  1. In the PCIe protocol itself, writes are inherently more performant than reads, since writes are posted whereas reads require a synchronous round-trip transaction.

  2. Jetson’s endpoint DMA engine were designed with the assumption that each of the PCIe root port and endpoint system would use their own DMA controller to write to the other system. (This is possible when Jetson is used on both sides of the PCIe link, but may not be possible with other host systems, since not all of them have a DMA controller that can access PCIe.) As such, the Jetson DMA controller is built with more available performance in its write path.

There may be registers in the PCIe controller, configuration space, and/or DMA controller, which allow you to tweak things such as maximum packet size and maximum number of outstanding transactions. These settings will affect performance too.

Hi, StephenWarren,
Thanks. I connect to Xavier with PCIE. And how to use RC DMA?

We currently don’t have any detailed examples or documentation re: how to use the PCIe controller DMA engines. The TRM (Technical Reference Manual) does contain the complete register specification. That said, assuming the RP is Jetson too, then the RP’s DMA engine is identical to the EP’s DMA engine, and so can be used in an identical fashion.

Hi, StephenWarren,
I have found DMA registers in TRM (Technical Reference Manual). But interrupt setting should be different with EP and how should I set it in RC DMA controller? thanks

I believe interrupts work identically between EP and RP DMA controllers. The only potential difference I’m aware of is that there’s a bit that allows interrupts to be routed to the local system or the remote system. For the EP controller either option is valid. For the RP controller, only local interrupt routing makes sense, since the PCIe specification doesn’t provide a method for an RP to send a generic interrupt to the EP.

If that doesn’t answer your question, could you please provide more details re: the differences you see, or what you need explained? Thanks.

Hi, StephenWarren,
I use the following function in file pcie-tegra-dw.c to transfer data to pcie endpoint with RC DMA.

static int dma_write(struct tegra_pcie_dw *pcie, struct dma_tx *tx)

It works, but the RC DMA may be timeout sometime(perhaps one hour) at the following position in dma_write function.

/* wait for completion or timeout */
	ret = wait_for_completion_timeout(&pcie->wr_cpl[tx->channel],
        if (ret == 0) {
           dev_err(dev, "DMA write timed out and no interrupt\n");

And RC DMA write may cause EP DMA write timeout too. Will RC DMA write and EP DMA write conflict? How can I solve the problem? thanks

PS: There is no other kernel error logs.

I’m sorry, but I don’t know the answer. I am not familiar with the low-level details of programming the DMA controller; simply the general higher-level features.


As soon as you observed the timeout error, stop the DMA and dump its registers.
Following is the command to dump the registers.
/home/ubuntu/reg_dump -a 0x3a060000 -s 0x900

Also share below details,

  1. SRC address
  2. DST address
  3. Size
  4. Share dmesg logs.
  • Manikanta