TX1: concurrent PCIE access and gpu activity/ cudaMemcopies

Hi,
I am using an FPGA to continuously transfer data in and out of the TX1 (28.2 release). If there is no GPU activity nor memcopies, the transfers are fine. However when I do have GPU activity or memory copies, then I am seeing latencies in reading from memory to PCIE that the FPGA can’t tolerate.

It seems that the SMMU or memory bus is prioritizing GPU access to memory over the PCIE. Is there a way that I can configure the TX1 to prioritize PCIE traffic?

In the TRM (pg 825) , I see the Tegra Snap Arbiter Tree which says that both PCIE and the GPU are on ring 2 of the tree. Can I change PCIE to an ISO client /i.e. ring 1 of the tree, so that I can meet latency requirements?

Can you please try with the following patch and let us know the result?
This would effectively increase priority of PCIe to highest at memory controller level

diff --git a/drivers/platform/tegra/mc/tegra21x_la.c b/drivers/platform/tegra/mc/tegra21x_la.c
index 81431468c288..9690b1b0cc14 100644
--- a/drivers/platform/tegra/mc/tegra21x_la.c
+++ b/drivers/platform/tegra/mc/tegra21x_la.c
@@ -257,7 +257,7 @@ static void t21x_init_ptsa(void)
        MC_SET_INIT_PTSA(p, gk,      -2, 0);
        MC_SET_INIT_PTSA(p, vicpc,   -2, 0);
        MC_SET_INIT_PTSA(p, apb,     -2, 0);
-       MC_SET_INIT_PTSA(p, pcx,     -2, 0);
+       MC_SET_INIT_PTSA(p, pcx,      1, 0x20);
        MC_SET_INIT_PTSA(p, host,    -2, 0);
        MC_SET_INIT_PTSA(p, ahb,     -2, 0);
        MC_SET_INIT_PTSA(p, sax,     -2, 0);

Dear Vidyas,

I will try your patch soon and report back on the results. I have already tried changing the following registers on the TX1 which solved the problem for me:

MC_PCX_PTSA_MAX_0 to 0x00000001
MC_PCX_PTSA_MIN_0 to 0x00000001

However the same registers are not found on the TX2. Is there a method to make PCIE high priority on the TX2 also ?

You can apply following patch in $TOP/kernel/t18x/ folder

diff --git a/drivers/platform/tegra/mc/tegra18x_la.c b/drivers/platform/tegra/mc/tegra18x_la.c
index 1f7b467aeaec..662563e2a3db 100644
--- a/drivers/platform/tegra/mc/tegra18x_la.c
+++ b/drivers/platform/tegra/mc/tegra18x_la.c
@@ -377,7 +377,7 @@ static void t18x_init_ptsa(void)
        T18X_MC_SET_INIT_PTSA_MIN_MAX_RATE(p, nic, NISO, -2, 0, 1);
        T18X_MC_SET_INIT_PTSA_MIN_MAX_RATE(p, nvd, SISO, 1, 1, 0);
        T18X_MC_SET_INIT_PTSA_MIN_MAX_RATE(p, nvd3, SISO, 1, 1, 0);
-       T18X_MC_SET_INIT_PTSA_MIN_MAX_RATE(p, pcx, NISO, -2, 0, 1);
+       T18X_MC_SET_INIT_PTSA_MIN_MAX_RATE(p, pcx, NISO, 1, 0x20, 1);
        T18X_MC_SET_INIT_PTSA_MIN_MAX_RATE(p, roc_dma_r, NISO, -2, 0, 1);
        T18X_MC_SET_INIT_PTSA_MIN_MAX_RATE(p, ring1_rd_b, NISO, 62, 0, 1);
        T18X_MC_SET_INIT_PTSA_MIN_MAX(p, ring1_rd_nb, HISO, -5, 31);

Dear Vidyas,

I have applied the patch on the TX1 and can confirm that it gives the desired behavior of reducing the impact of gpu memcopies on the PCIE traffic.

Thanks,

Akmal

I also tried with TX2, and see some marginal reduction in latency. The problem is not as critical on the TX2, but PCIE latency is notably reduced on the TX1. :)

Thanks for the info