pcie endpoint mode，root mode communication

Jason_888 · September 27, 2019, 8:51am

hi,
I refer the NVIDIA Jetson AGX Xavier Series PCIe Endpoint Design Guidelines ,config the xavier to root mode and endpoint mode ,but I find the shared RAM is 4k.I want to use the share RAM about 200M.How to config the endpoint mode xaver adn root mode xavier? Thank you!

Jason_888 · September 30, 2019, 7:27am

hi，
Does anyone know about this？

vidyas · October 7, 2019, 9:32am

What is the release you are using? I mean is it 32.1 or 32.2?

Jason_888 · October 9, 2019, 1:54am

hi,
I am using kernel_src_R32.1_JAX_TX2.Do you mean we should use higher version 32.2?

vidyas · October 9, 2019, 10:20am

Please apply the following patches to increase the aperture sizes

diff --git a/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi b/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
index 2333198..cf2176e 100644
--- a/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
+++ b/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
@@ -572,8 +572,8 @@
  
        bus-range = <0x0 0xff>;
        ranges = <0x81000000 0x0 0x38100000 0x0 0x38100000 0x0 0x00100000      /* downstream I/O (1MB) */
-             0x82000000 0x0 0x38200000 0x0 0x38200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-             0xc2000000 0x18 0x00000000 0x18 0x00000000 0x4 0x00000000>;  /* prefetchable memory (16GB) */
+             0x82000000 0x0 0x40000000 0x1B 0x40000000 0x0 0xC0000000     /* non-prefetchable memory (3GB) */
+             0xc2000000 0x18 0x00000000 0x18 0x00000000 0x3 0x40000000>;  /* prefetchable memory (12GB) */
  
        nvidia,cfg-link-cap-l1sub = <0x1c4>;
        nvidia,cap-pl16g-status = <0x174>;
@@ -640,8 +640,8 @@
  
        bus-range = <0x0 0xff>;
        ranges = <0x81000000 0x0 0x30100000 0x0 0x30100000 0x0 0x00100000      /* downstream I/O (1MB) */
-             0x82000000 0x0 0x30200000 0x0 0x30200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-             0xc2000000 0x12 0x00000000 0x12 0x00000000 0x0 0x40000000>;  /* prefetchable memory (1GB) */
+             0x82000000 0x0 0x40000000 0x12 0x30000000 0x0 0x10000000     /* non-prefetchable memory (256MB) */
+             0xc2000000 0x12 0x00000000 0x12 0x00000000 0x0 0x30000000>;  /* prefetchable memory (768MB) */
  
        nvidia,cfg-link-cap-l1sub = <0x194>;
        nvidia,cap-pl16g-status = <0x164>;
@@ -707,8 +707,8 @@
  
        bus-range = <0x0 0xff>;
        ranges = <0x81000000 0x0 0x32100000 0x0 0x32100000 0x0 0x00100000      /* downstream I/O (1MB) */
-             0x82000000 0x0 0x32200000 0x0 0x32200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-             0xc2000000 0x12 0x40000000 0x12 0x40000000 0x0 0x40000000>;  /* prefetchable memory (1GB) */
+             0x82000000 0x0 0x40000000 0x12 0x70000000 0x0 0x10000000     /* non-prefetchable memory (256MB) */
+             0xc2000000 0x12 0x40000000 0x12 0x40000000 0x0 0x30000000>;  /* prefetchable memory (768MB) */
  
        nvidia,cfg-link-cap-l1sub = <0x194>;
        nvidia,cap-pl16g-status = <0x164>;
@@ -774,8 +774,8 @@
  
        bus-range = <0x0 0xff>;
        ranges = <0x81000000 0x0 0x34100000 0x0 0x34100000 0x0 0x00100000      /* downstream I/O (1MB) */
-             0x82000000 0x0 0x34200000 0x0 0x34200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-             0xc2000000 0x12 0x80000000 0x12 0x80000000 0x0 0x40000000>;  /* prefetchable memory (1GB) */
+             0x82000000 0x0 0x40000000 0x12 0xB0000000 0x0 0x10000000     /* non-prefetchable memory (256MB) */
+             0xc2000000 0x12 0x80000000 0x12 0x80000000 0x0 0x30000000>;  /* prefetchable memory (768MB) */
  
        nvidia,cfg-link-cap-l1sub = <0x194>;
        nvidia,cap-pl16g-status = <0x164>;
@@ -841,8 +841,8 @@
  
        bus-range = <0x0 0xff>;
        ranges = <0x81000000 0x0 0x36100000 0x0 0x36100000 0x0 0x00100000      /* downstream I/O (1MB) */
-             0x82000000 0x0 0x36200000 0x0 0x36200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-             0xc2000000 0x14 0x00000000 0x14 0x00000000 0x4 0x00000000>;  /* prefetchable memory (16GB) */
+             0x82000000 0x0 0x40000000 0x17 0x40000000 0x0 0xC0000000      /* non-prefetchable memory (3GB) */
+             0xc2000000 0x14 0x00000000 0x14 0x00000000 0x3 0x40000000>;  /* prefetchable memory (12GB) */
  
        nvidia,cfg-link-cap-l1sub = <0x1b0>;
        nvidia,cap-pl16g-status = <0x174>;
@@ -913,8 +913,8 @@
  
        bus-range = <0x0 0xff>;
        ranges = <0x81000000 0x0 0x3a100000 0x0 0x3a100000 0x0 0x00100000      /* downstream I/O (1MB) */
-             0x82000000 0x0 0x3a200000 0x0 0x3a200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-             0xc2000000 0x1c 0x00000000 0x1c 0x00000000 0x4 0x00000000>;  /* prefetchable memory (16GB) */
+             0x82000000 0x0 0x40000000 0x1f 0x40000000 0x0 0xC0000000     /* non-prefetchable memory (3GB) */
+             0xc2000000 0x1c 0x00000000 0x1c 0x00000000 0x3 0x40000000>;  /* prefetchable memory (12GB) */
  
        nvidia,cfg-link-cap-l1sub = <0x1c4>;
        nvidia,cap-pl16g-status = <0x174>;

and

diff --git a/drivers/pci/dwc/pcie-tegra.c b/drivers/pci/dwc/pcie-tegra.c
index d118cf9..8593dee 100644
--- a/drivers/pci/dwc/pcie-tegra.c
+++ b/drivers/pci/dwc/pcie-tegra.c
@@ -2959,12 +2959,14 @@
            /* program iATU for Non-prefetchable MEM mapping */
            outbound_atu(pp, PCIE_ATU_REGION_INDEX3,
                     PCIE_ATU_TYPE_MEM, win->res->start,
-                    win->res->start, resource_size(win->res));
+                    win->res->start - win->offset,
+                    resource_size(win->res));
        } else if (win->res->flags & IORESOURCE_MEM) {
            /* program iATU for Non-prefetchable MEM mapping */
            outbound_atu(pp, PCIE_ATU_REGION_INDEX2,
                     PCIE_ATU_TYPE_MEM, win->res->start,
-                    win->res->start, resource_size(win->res));
+                    win->res->start - win->offset,
+                    resource_size(win->res));
        }
    }

Most likely, the above patches might be present already. (Still posting to make sure that these are not missed out).
Now, apply below patch to enable 512 MB of BAR in the endpoint.

diff --git a/drivers/pci/endpoint/functions/pci-epf-nv-test.c b/drivers/pci/endpoint/functions/pci-epf-nv-test.c
index 8b2a1dcecab8..d3496a199842 100644
--- a/drivers/pci/endpoint/functions/pci-epf-nv-test.c
+++ b/drivers/pci/endpoint/functions/pci-epf-nv-test.c
@@ -16,7 +16,7 @@
 #include <linux/pci-epc.h>
 #include <linux/pci-epf.h>
  
-#define BAR0_SIZE SZ_64K
+#define BAR0_SIZE SZ_512M
  
 struct pci_epf_nv_test {
    struct pci_epf_header header;
@@ -30,14 +30,11 @@ static void pci_epf_nv_test_unbind(struct pci_epf *epf)
    struct pci_epf_nv_test *epfnv = epf_get_drvdata(epf);
    struct pci_epc *epc = epf->epc;
    struct device *cdev = epc->dev.parent;
-   struct iommu_domain *domain = iommu_get_domain_for_dev(cdev);
  
    pci_epc_stop(epc);
    pci_epc_clear_bar(epc, BAR_0);
-   vunmap(epfnv->bar0_ram_map);
-   iommu_unmap(domain, epfnv->bar0_iova, PAGE_SIZE);
-   iommu_dma_free_iova(cdev, epfnv->bar0_iova, BAR0_SIZE);
-   __free_pages(epfnv->bar0_ram_page, 1);
+   dma_free_coherent(cdev, BAR0_SIZE, epfnv->bar0_ram_map,
+             epfnv->bar0_iova);
 }
  
 static int pci_epf_nv_test_bind(struct pci_epf *epf)
@@ -47,7 +44,6 @@ static int pci_epf_nv_test_bind(struct pci_epf *epf)
    struct pci_epf_header *header = epf->header;
    struct device *fdev = &epf->dev;
    struct device *cdev = epc->dev.parent;
-   struct iommu_domain *domain = iommu_get_domain_for_dev(cdev);
    int ret;
  
    ret = pci_epc_write_header(epc, header);
@@ -56,60 +52,29 @@ static int pci_epf_nv_test_bind(struct pci_epf *epf)
        return ret;
    }
  
-   epfnv->bar0_ram_page = alloc_pages(GFP_KERNEL, 1);
-   if (!epfnv->bar0_ram_page) {
-       dev_err(fdev, "alloc_pages() failed\n");
-       ret = -ENOMEM;
-       goto fail;
-   }
-   dev_info(fdev, "BAR0 RAM phys: 0x%llx\n",
-        page_to_phys(epfnv->bar0_ram_page));
-
-   epfnv->bar0_iova = iommu_dma_alloc_iova(cdev, BAR0_SIZE,
-                       cdev->coherent_dma_mask);
-   if (!epfnv->bar0_iova) {
-       dev_err(fdev, "iommu_dma_alloc_iova() failed\n");
-       ret = -ENOMEM;
-       goto fail_free_pages;
-   }
-
-   dev_info(fdev, "BAR0 RAM IOVA: 0x%08llx\n", epfnv->bar0_iova);
-
-   ret = iommu_map(domain, epfnv->bar0_iova,
-           page_to_phys(epfnv->bar0_ram_page),
-           PAGE_SIZE, IOMMU_READ | IOMMU_WRITE);
-   if (ret) {
-       dev_err(fdev, "iommu_map(RAM) failed: %d\n", ret);
-       goto fail_free_iova;
-   }
-   epfnv->bar0_ram_map = vmap(&epfnv->bar0_ram_page, 1, VM_MAP,
-                  PAGE_KERNEL);
+   epfnv->bar0_ram_map = dma_alloc_coherent(cdev, BAR0_SIZE,
+                        &epfnv->bar0_iova, GFP_KERNEL);
    if (!epfnv->bar0_ram_map) {
-       dev_err(fdev, "vmap() failed\n");
+       dev_err(fdev, "dma_alloc_coherent() failed\n");
        ret = -ENOMEM;
-       goto fail_unmap_ram_iova;
+       return ret;
    }
-   dev_info(fdev, "BAR0 RAM virt: 0x%p\n", epfnv->bar0_ram_map);
+   dev_info(fdev, "BAR0 RAM IOVA: 0x%llx\n", epfnv->bar0_iova);
  
    ret = pci_epc_set_bar(epc, BAR_0, epfnv->bar0_iova, BAR0_SIZE,
                  PCI_BASE_ADDRESS_SPACE_MEMORY |
-                 PCI_BASE_ADDRESS_MEM_TYPE_32);
+                 PCI_BASE_ADDRESS_MEM_TYPE_32 |
+                 PCI_BASE_ADDRESS_MEM_PREFETCH);
    if (ret) {
        dev_err(fdev, "pci_epc_set_bar() failed: %d\n", ret);
-       goto fail_unmap_ram_virt;
+       goto fail_set_bar;
    }
  
    return 0;
  
-fail_unmap_ram_virt:
-   vunmap(epfnv->bar0_ram_map);
-fail_unmap_ram_iova:
-   iommu_unmap(domain, epfnv->bar0_iova, PAGE_SIZE);
-fail_free_iova:
-   iommu_dma_free_iova(cdev, epfnv->bar0_iova, BAR0_SIZE);
-fail_free_pages:
-   __free_pages(epfnv->bar0_ram_page, 1);
-fail:
+fail_set_bar:
+   dma_free_coherent(cdev, BAR0_SIZE, epfnv->bar0_ram_map,
+             epfnv->bar0_iova);
    return ret;
 }

Jason_888 · October 11, 2019, 4:01am

Hi vidyas
Thank you for your reply！
I should apply this patch for endpoint mode. Do I need to apply this patch for root mode device?

vidyas · October 11, 2019, 5:49am

First two patches on the root port side. The last one is on endpoint side.

Jason_888 · October 23, 2019, 4:03am

hi vidyas，
Now I use the latest version 32.2.When I use the default code ,I should access 64k shared memory.But it seems I can only use 4k shared memory.Root device can not get the data over 4k.And I find the patch for root mode is already in the kernel.

endpont mode:

nvidia@nvidia-desktop:~/work$ dmesg |grep pci_epf_nv_test
[   90.450143] pci_epf_nv_test pci_epf_nv_test.0: BAR0 RAM phys: 0x405b96000
[   90.450208] pci_epf_nv_test pci_epf_nv_test.0: BAR0 RAM IOVA: 0xffff0000
[   90.450267] pci_epf_nv_test pci_epf_nv_test.0: BAR0 RAM virt: 0xffffff800bd8b000

root mode:

nvidia@nvidia-desktop:~$ lspci -v
0005:01:00.0 RAM memory: NVIDIA Corporation Device 0001
        Flags: fast devsel, IRQ 255
        Memory at 1f40100000 (32-bit, non-prefetchable) [disabled] 
        Memory at 1c00000000 (64-bit, prefetchable) [disabled] 
        Memory at 1f40000000 (64-bit, non-prefetchable) [disabled] 
        Capabilities: <access denied>

memory test less than 4k can work:
endpoint:

nvidia@nvidia-desktop:~/work$ sudo busybox devmem 0x405b96f00 32 0x1234
nvidia@nvidia-desktop:~/work$ sudo busybox devmem 0x405b96f00
0x00001234

root:

nvidia@nvidia-desktop:~$ sudo busybox devmem 0x1f40100f00
0x00001234

memory test more than 4k can not work:
endpoint:

nvidia@nvidia-desktop:~/work$ sudo busybox devmem 0x405b98000 32 0x1234
nvidia@nvidia-desktop:~/work$ sudo busybox devmem 0x405b98000
0x00001234

root:

nvidia@nvidia-desktop:~$ sudo busybox devmem 0x1f40102000
0xFFFFFFFF

Jason_888 · October 23, 2019, 10:32am

When I patch the file for endpoint device,I can not find the adress of BAR0 RAM phys.

nvidia@nvidia-desktop:~/work$ dmesg |grep nv_test
[  103.016937] pci_epf_nv_test pci_epf_nv_test.0: BAR0 RAM IOVA: 0xe0000000

Jason_888 · November 11, 2019, 10:17am

vidyas,
Do you have any update for my question?
Have you ever tested this function in your lab?

vidyas · November 15, 2019, 7:48am

Yes. They have been tested in our labs.
With default codebase, your observation is correct in the sense that only one page (i.e. 4K) is getting mapped and you can only use that in both root port side and endpoint side.
In this method, we are reserving pages in the physical memory space and then, converting them to IOVA. This also means that the possibility of getting memory reservations (in physical address space) for higher sizes would fail.
Hence, made the patch where I don’t reserve the memory in physical address space first and then convert it to IOVA. Instead, I’m using dma_alloc_coherent() API to reserve the space in IOVA directly as ultimately, what the PCIe controller tries to access is IOVA and not physical address ¶.
In this method, the address would be contiguous in IOVA space but may/may not in PA space. Also, in this method, you can’t get the PA also (as we don’t need PA for any good reason… other than trying to access it with busybox).
What is your final use case and how are you planning to use it exactly?

Jason_888 · November 15, 2019, 10:30am

vidyas,
Thank you for your explain!
We plan to use two xavier.XavierA should transfer image and other data to xavierB.
So we just want to expand the shared memory to about 512M .And what is the speed of shared memory?

Topic		Replies	Views
Xavier pcie endpoint share memory size Jetson AGX Xavier pcie	28	5048	October 21, 2021
Increasing size of BAR0 in Endpoint Mode Jetson AGX Xavier pcie	3	1898	November 23, 2022
To use PCIe to communication between xavier & rk3588. Xavier is configured as endpoint mode. Share mem isn't working Jetson Xavier NX pcie	1	117	March 27, 2025
PCIe ram share between two jetson NX, but I cann't r/w ram at second page in rp side Jetson Xavier NX pcie	9	1205	October 18, 2021
Xavier in EP mode(shared memory), the default size is [size=64K], but it can only be used 4KB Jetson AGX Xavier pcie	4	301	June 11, 2024
Xavier PCIe share memory access issue Jetson AGX Xavier pcie	3	1517	October 18, 2021
How do I increase size of BAR0 in Endpoint Mode on Orin NX Jetson Orin NX pcie	15	604	April 29, 2024
Problems about Xavier NX PCIE Endpoint mode Jetson Xavier NX pcie	2	716	October 18, 2021
PCIe non-prefetch space size in Xavier Jetson AGX Xavier	6	1369	October 18, 2021
Xavier pcie share memory access from user space Jetson AGX Xavier pcie , kernel	4	791	February 1, 2023

pcie endpoint mode，root mode communication

Related topics