PCIe DMA doesn't work for L4T 24.1

It was reported earlier that PCIe DMA is broken for L4T 24.1.

http://devtalk.nvidia.com/default/topic/936880/jetson-tx1/jetson-tx1-24-1-release-need-help-with-complier-directions-can-not-complie/post/4885785/#4885785

According to jayds, DMA partially worked for R23.x.

We got a RuggedStone 5 FPGA card, solved some issues by adding “vmalloc=256M cma=128M coherent-pool=96M” to “/boot/extlinux/extlinux.conf” according to jayds’s suggestion.

But the “dma_set_coherent_mask” function in driver still failes. The same function worked for Intel CPUs.

We need PCIe for grabbers to acquire Camera Link and other cameras.

Thanks in advance for any suggestion to troubleshoot DMA issues in L4T 24.1.

Thanks yahoo2016, we will investigate the reports. What is your FPGA driver attempting to set the dma coherent mask to?
Also is this 32-bit R24.1 or 64-bit R24.1?

Dusty,

We must use 64 bit in our application.

FPGA driver tries to set mask to 31 bits (it can only use lower 2GB), I found TX1 only allows mask to be 32 bits.

I found this link:

http://lwn.net/Articles/543408/

It mentioned “On some ARM systems, memory does not start at a physical address of zero; the physical address of the first byte can be as high as 3GB (0xc0000000).”

What is physical address range of TX1 SDRAM? FPGA vendor may be able to adjust DMA address range if base address of TX1 is known.

Thanks

Also as jayds mentioned:

"File “drivers/base/Kconfig” should be edited. Line 234 has a quoted string, the trailing quote is missing. Add the quote back in, config should work for this. There are other dependency issues, but they do not seem to be fatal.

This is the Kconfig for building the DMA coherent allocation code in which if not correct looks like the reason I could not get PCIe DMA to work under 24.1 for TX1."

I’ll try to rebuild kernel with “drivers/base/Kconfig” fixed.

Please make sure the fix is included in the next release of L4T.

Thanks

What is physical address range of TX1 SDRAM? FPGA vendor may be able to adjust DMA address range if base address of TX1 is known.
TX1’s SDRAM starts from 0x80000000 physical address

We’ve confirmed the fix should be included in the next update, thanks.

When TX1 SDRAM base address 0x00000000 was assumed, DMA from FPGA to TX1 produced all '0’s instead of test pattern from driver.

When TX1 SDRAM base address 0x80000000 was assumed, DMA from FPGA to TX1 always crashes TX1.

Where can we find documentations of X1/TX1 memory mapping?

Thanks

I got the following errors when DMA is called:

[ 172.338264] mc-err: (0) csw_afiw: EMEM address decode error
[ 172.343948] mc-err: status = 0x20010031; addr = 0x57c00000
[ 172.349809] mc-err: secure: no, access-type: write, SMMU fault: none

I searched for the errors and found this post:

https://devtalk.nvidia.com/default/topic/940077/ath9k-driver-causes-csr_afir-emem-address-decode-error-fixed-/

The last post quoted:

“HACK: Disable IOMMU PCIe till dynamic loadable module issue is solved.”

Is the EMEM address decoder disabled now? That would explain why DMAs do not work.

Has Nvidia or anyone tested PCIex4 SSD under R24.1?

There is a potential issue if the DMA hardware in the PCIe end point device is only 32-bit capable.
If you are using 24.1 release, things are expected to work fine with the following changes

— a/arch/arm64/boot/dts/tegra210-soc-base.dtsi
+++ b/arch/arm64/boot/dts/tegra210-soc-base.dtsi
@@ -1252,6 +1252,7 @@
0x82000000 0 0x13000000 0x0 0x13000000 0 0x0d000000 /* non-prefetchable memory (208 MiB) /
0xc2000000 0 0x20000000 0x0 0x20000000 0 0x20000000>; /
prefetchable memory (512 MiB) */

  •   iommus = <&smmu TEGRA_SWGROUP_AFI>;
      status = "disabled";
    
      pci@1,0 {
    

and

— a/drivers/iommu/of_tegra-smmu.c
+++ b/drivers/iommu/of_tegra-smmu.c
@@ -166,7 +166,6 @@ u64 tegra_smmu_of_get_swgids(struct device *dev,
u64 fixup, swgids = 0;

if (dev_is_pci(dev)) {
  •   return SWGIDS_ERROR_CODE;
      swgids = TEGRA_SWGROUP_BIT(AFI);
      goto try_fixup;
    
    }

This is relevant to the problems I’ve been dealing with.

I too have an FPGA attached to the NVIDIA TX1. This FPGA links up just fine (as apposed to my Spartan 6 Board).

I wrote and tested out my driver on a regular x64 Ubuntu desktop and everything worked fine.

I brought it over to the TX1 L4T 24.1 git hash: 07fb031a9cea70cb733203d3c9867000639c6e87 and have had some difficulties. I noticed some strange things.

Probe Function Modifies OV5693

If I load the driver and the probe is called I see this on dmesg every time:

[ 126.795519] [OV5693]: probing v4l2 sensor.
[ 126.795725] ov5693 6-0036: camera_common_regulator_get vana ERR: fffffffffffffdfb
[ 126.803617] ov5693 6-0036: camera_common_regulator_get vif ERR: fffffffffffffdfb

I played with the driver a bit and modified the function that is called when the PCIE device is found (probe function). Even if the only thing I do is “return 0;” I see the above.

I’m not using the OV5963 camera at the moment but I thought I would mention this either way.

pci_set_dma_mask with 32-bit MASK returns 33-bit address??

I’m still navigating my way through the PCIE protocol and the accompanying kernel interface but I thought that by calling this function

pci_set_dma_mask(dev, DMA_BIT_MASK(32))

I would get a 32-bit address but instead I got a 33-bit address. I suppose technically it’s a 64-bit address but I didn’t want to split hairs:

(Second address on each line is the remapped DMA address)

[ 126.792856] nysa_pcie - construct_pcie_device : Status Buf Addr: ffffffc0e241c000 : DMA Buf Addr : 000000016241c000
[ 126.792867] nysa_pcie - construct_pcie_device : Write Buf [0] Addr: ffffffc0ef142000 : DMA Buf Addr : 000000016f142000
[ 126.792875] nysa_pcie - construct_pcie_device : Write Buf [1] Addr: ffffffc0f5c35000 : DMA Buf Addr : 0000000175c35000
[ 126.792891] nysa_pcie - construct_pcie_device : Read Buf [0] Addr: ffffffc0e248b000 : DMA Buf Addr : 000000016248b000
[ 126.792899] nysa_pcie - construct_pcie_device : Read Buf [1] Addr: ffffffc0e248a000 : DMA Buf Addr : 000000016248a000

Note: My driver is very simple. It is not using ‘scatter-gather’ it is simply using a double buffer scheme with 4096 byte blocks so I don’t think I need to use ‘coherent’. The status buffer is just a single 4096 byte block I use to read the status of the HDL controller inside my FPGA (I’ll call it FPGA Controller).

I was confused but, at first, I proceeded to use only the lower 32-bit address to send data from the FPGA to the TX1 status buffer but it wouldn’t work.

I modified my FPGA controller to use 64-bit addressing and changed the DMA mask to use 64-bit instead of 32-bit and I could send data from the FPGA to the TX1 on the status buffer so I can confirm that DMA transfers from the PCI Device to the TX1 Root Complex does work. Although I don’t know if this is true for scatter-gather based drivers.

The driver I wrote allows me to read and write data to the FPGA using a character device file. Basically I open up the file and write data into it and this data will get down to the FPGA. There is a little bit more to it than that but for the sake of brevity this is sufficient.

I wrote a simply python script that attempts to send 20 bytes down to the FPGA but it didn’t work. I have a logic analyzer built into the FPGA and have observed the FPGA controller recognize that the TX1 wants to write down 20 bytes. I can see that the controller does create the memory read request:

This is the PCIE TLP Memory Request that the FPGA controller generates:


20000080
01000000
00000001
6F142000

Here is the significance of each line:
20000080

  • Memory Read Transaction Request
  • TLP Packet is for a 64-bit memory read transaction
  • Length of read request: 512 bytes (0x80 * 4)

01000000

  • Requester ID: Bus ID: 0x01, Device ID: 0x00, Function ID: 0x00
  • Memory Read Tag: 0x00

00000001
6F142000

  • Address to read from: 0x000000016F142000

I am expecting that the root complex on the TX1 would send 512 bytes down to the FPGA but I don’t see this, instead my driver blocks the user application. This worked on the desktop computer but the TX1 is just frozen. dmesg doesn’t say anything is wrong. It seems as though the root complex never received the DMA request.

When I designed the driver I was afraid that this situation might happen so I added a back door into it using ‘sysfs’ that allows me to unblock my driver and exit gracefully. When I unblocked my driver I saw these messages using dmesg:

[ 1373.816730] pcieport 0000:00:01.0: AER: Uncorrected (Fatal) error received: id=0010
[ 1373.816750] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0008(Receiver ID)
[ 1373.828269] pcieport 0000:00:01.0: device [10de:0fae] error status/mask=00040000/00000000
[ 1373.837191] pcieport 0000:00:01.0: [18] Malformed TLP (First)
[ 1373.843975] pcieport 0000:00:01.0: TLP Header: 20000080 01000000 00000001 6f142000
[ 1373.852172] pcieport 0000:00:01.0: broadcast error_detected message
[ 1373.852182] nysa_pcie 0000:01:00.0: device has no AER-aware driver
[ 1374.088335] pcieport 0000:00:01.0: Root Port link has been reset
[ 1374.088514] pcieport 0000:00:01.0: AER: Device recovery failed

It’s strange that the root complex thinks that this is a ‘malformed’ TLP. Using the PCIE specification I verified that this is a valid packet.

This all may be a moot point. I want to try out the above modification but I have to figure out how to preserve my root file system so I don’t have re-install all my tools each time I re-flash.

I plan to try this out tonight.

Dave

If you are referring to root file system of the Jetson, just clone it. Info on cloning for a TK1 mostly applies, see:
http://elinux.org/Jetson/Cloning

Specific command to clone JTX1:
https://devtalk.nvidia.com/default/topic/898999/jetson-tx1/tx1-r23-1-new-flash-structure-how-to-clone-/post/4784149/#4784149

@linuxdev

Great, this will be awesome!

Given your following observation
“”
I was confused but, at first, I proceeded to use only the lower 32-bit address to send data from the FPGA to the TX1 status buffer but it wouldn’t work.

I modified my FPGA controller to use 64-bit addressing and changed the DMA mask to use 64-bit instead of 32-bit and I could send data from the FPGA to the TX1 on the status buffer so I can confirm that DMA transfers from the PCI Device to the TX1 Root Complex does work. Although I don’t know if this is true for scatter-gather based drivers.
“”
I think the solution given in #9 should work.
BTW, are you saying that even after configuring the FPGA’s DMA to work with 64-bit addresses, you are seeing ‘malformed TLP’ issues?

“”
BTW, are you saying that even after configuring the FPGA’s DMA to work with 64-bit addresses, you are seeing ‘malformed TLP’ issues?
“”

Yes, but before this becomes an issue I want to implement the fix and modify the AER error reporter to not only send the TLP but the total TLP length.

It may be possible that the FPGA is sending an extra 32-bit packet. I don’t believe this is true but It has been something that I have been thinking about.

Hopefully I can get this built and tested today.

Sorry, I mistakenly deleted my driver and realized I hadn’t pushed my recent commits so I have to fix the driver before I can test this out.

Okay, that was trickier than I thought, I’ll spare the painful details.

I fixed the driver and installed Jetpack 2.2 and verified that the driver worked the same as it did before, just to make sure that I didn’t introduce any new problems.

Then I downloaded and installed the Jetson TX1 64-bit Driver Package (24.1)

I modified the kernel source to include the changes in the forum, here is the diff from commit:

07fb031a9cea70cb733203d3c9867000639c6e87

Or this is the tag I used

Tag: tegra-l4t-r24.1

Note: The compiler was complaining about the tegra_clock so I modified it, it should behave the same way as before.

My Commit:

diff --git a/arch/arm64/boot/dts/tegra210-soc-base.dtsi b/arch/arm64/boot/dts/tegra210-soc-base.dtsi
index b7cc544..dc4241d 100644
--- a/arch/arm64/boot/dts/tegra210-soc-base.dtsi
+++ b/arch/arm64/boot/dts/tegra210-soc-base.dtsi
@@ -1252,6 +1252,7 @@
                          0x82000000 0 0x13000000 0x0 0x13000000 0 0x0d000000   /* non-prefetchable memory (208 MiB) */
                          0xc2000000 0 0x20000000 0x0 0x20000000 0 0x20000000>; /* prefetchable memory (512 MiB) */
 
+    iommus = <&smmu TEGRA_SWGROUP_AFI>;
                status = "disabled";
 
                pci@1,0 {
diff --git a/drivers/iommu/of_tegra-smmu.c b/drivers/iommu/of_tegra-smmu.c
index d71e89c..998b928 100644
--- a/drivers/iommu/of_tegra-smmu.c
+++ b/drivers/iommu/of_tegra-smmu.c
@@ -166,7 +166,7 @@ u64 tegra_smmu_of_get_swgids(struct device *dev,
        u64 fixup, swgids = 0;
 
        if (dev_is_pci(dev)) {
-               return SWGIDS_ERROR_CODE;
+               //return SWGIDS_ERROR_CODE;
                swgids = TEGRA_SWGROUP_BIT(AFI);
                goto try_fixup;
        }
diff --git a/drivers/platform/tegra/tegra21_clocks.c b/drivers/platform/tegra/tegra21_clocks.c
index 9ae33d2..e437508 100644
--- a/drivers/platform/tegra/tegra21_clocks.c
+++ b/drivers/platform/tegra/tegra21_clocks.c
@@ -1061,7 +1061,16 @@ static struct clk_ops tegra_super_ops = {
  */
 static void tegra21_cpu_clk_init(struct clk *c)
 {
-       c->state = (!is_lp_cluster() == (c->u.cpu.mode == MODE_G)) ? ON : OFF;
+       //c->state = (!is_lp_cluster() == (c->u.cpu.mode == MODE_G)) ? ON : OFF;
+  int mg = 0;
+  if (c->u.cpu.mode == MODE_G)
+    mg = 1;
+
+  if (mg == !is_lp_cluster())
+    c->state = ON;
+  else
+    c->state = OFF;
+
 }

Installing the driver

I built the driver and loaded it. Here is the output of dmesg after I loaded it.

[  135.891254] nysa_pcie - nysa_pcie_init : Registering Driver
[  135.896854] nysa_pcie - construct_pcie_ctr : Create PCIE Control
[  135.903019] nysa_pcie - construct_pcie_ctr : Creating space for 1 devices
[  135.909972] nysa_pcie - nysa_pcie_probe : Found PCI Device: 10EE:7011 0000:01:00.0
[  135.917636] nysa_pcie - construct_pcie_device : Entered
[  135.922900] nysa_pcie - construct_pcie_device : Get Device at index: 0 (Major: 232 Minor: 0)
[  135.931484] nysa_pcie - construct_pcie_device : Got Device
[  135.937018] nysa_pcie - construct_pcie_device : Enable PCIE Device
[  135.943236] PCI: enabling device 0000:01:00.0 (0140 -> 0142)
[  135.948951] nysa_pcie - construct_pcie_device : Allow PCIE Device to be a master
[  135.956364] nysa_pcie - construct_pcie_device : Set DMA Mask to 32 bits
[  135.963102] nysa_pcie - construct_pcie_device : Get the start of the base address register
[  135.971383] nysa_pcie - construct_pcie_device : Get the Length of the Base Address Register
[  135.979793] nysa_pcie - construct_pcie_device : Get Base Address
[  135.985806] nysa_pcie - construct_pcie_device : BAR Address: 0x13000000
[  135.992557] nysa_pcie - construct_pcie_device : BAR Length: 0x20000
[  135.998847] nysa_pcie - construct_pcie_device : Get Virtual Address
[  136.005223] nysa_pcie - construct_pcie_device : Virtual Address: 0xFFFFFF8007800000
[  136.012900] nysa_pcie - construct_pcie_device : Request Memory Region
[  136.019435] nysa_pcie - construct_pcie_device : Enable MSI Interrupts
[  136.026027] nysa_pcie - construct_pcie_device : Create the buffers for DMA
[  136.032945] nysa_pcie - construct_pcie_device : Status Buf Addr: ffffffc0fc8d5000 : DMA Buf Addr : 0000000080000000
[  136.043458] nysa_pcie - construct_pcie_device : Write Buf [0] Addr: ffffffc0fc588000 : DMA Buf Addr : 0000000080002000
[  136.054247] nysa_pcie - construct_pcie_device : Write Buf [1] Addr: ffffffc0fc0ad000 : DMA Buf Addr : 0000000080004000
[  136.065005] nysa_pcie - construct_pcie_device : Read Buf [0] Addr: ffffffc0fc855000 : DMA Buf Addr : 0000000080006000
[  136.075726] nysa_pcie - construct_pcie_device : Read Buf [1] Addr: ffffffc0f968f000 : DMA Buf Addr : 0000000080008000
[  136.086542] nysa_pcie - construct_pcie_device : Create Device File
[  136.092758] nysa_pcie - construct_pcie_device : Created Device File
[  136.099132] nysa_pcie - construct_pcie_device : Create kfifo
[  136.104839] nysa_pcie - construct_pcie_device : kfifo created
[  136.110609] nysa_pcie - construct_pcie_device : initialized kfifo
[  136.116954] platform 7.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.125293] platform d.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.125459] platform c9.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.125612] platform ca.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.125763] platform cb.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.125916] platform cc.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.126066] platform cd.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.126199] reg-fixed-sync-voltage ce.regulator: Consumer c0 does not have device name
[  136.126218] platform ce.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.126363] platform d1.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.126508] platform d3.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.126631] [OV5693]: probing v4l2 sensor.
[  136.126788] ov5693 6-0036: camera_common_regulator_get vana ERR: fffffffffffffdfb
[  136.126811] ov5693 6-0036: camera_common_regulator_get vif ERR: fffffffffffffdfb
[  136.126841] i2c 6-0036: Driver ov5693 requests probe deferral
[  136.126937] reg-fixed-sync-voltage 5.regulator: Consumer c1 does not have device name
[  136.126957] platform 5.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  136.125101] nysa_pcie - nysa_pcie_init : Driver Initialized, waiting for probe...
[  136.725059] init: alsa-restore main process (1220) terminated with status 99
[  136.787973] init: plymouth-stop pre-start process (1290) terminated with status 1

The addresses are now 32-bit:

[ 136.032945] nysa_pcie - construct_pcie_device : Status Buf Addr: ffffffc0fc8d5000 : DMA Buf Addr : 0000000080000000
[ 136.043458] nysa_pcie - construct_pcie_device : Write Buf [0] Addr: ffffffc0fc588000 : DMA Buf Addr : 0000000080002000
[ 136.054247] nysa_pcie - construct_pcie_device : Write Buf [1] Addr: ffffffc0fc0ad000 : DMA Buf Addr : 0000000080004000
[ 136.065005] nysa_pcie - construct_pcie_device : Read Buf [0] Addr: ffffffc0fc855000 : DMA Buf Addr : 0000000080006000
[ 136.075726] nysa_pcie - construct_pcie_device : Read Buf [1] Addr: ffffffc0f968f000 : DMA Buf Addr : 0000000080008000

Exercising the driver

Before I attempted to write data to the FPGA I requested the status data to be sent from the FPGA to the host and here is the output (This worked before).

[  215.823735] nysa_pcie - nysa_pcie_open : Opened!
[  215.831007] nysa_pcie - nysa_pcie_open : Minor Number: 0
[  215.836505] nysa_pcie - nysa_pcie_llseek : in llseek
[  215.845612] nysa_pcie - nysa_pcie_llseek : enable command mode!
[  215.851598] nysa_pcie - nysa_pcie_write : Command Mode!
[  215.856834] nysa_pcie - nysa_pcie_write : Write: 0x0000008A 0x00000000 0x00000000
[  215.865596] nysa_pcie - nysa_pcie_write : Not Command Mode!
[  215.871223] nysa_pcie - write_command : Writting Command: Addr: 0x0000008A Data: 0x00000000 Device Addrss: 0x00000000
[  215.881879] nysa_pcie - nysa_pcie_llseek : 
[  215.881910] nysa_pcie - msi_isr : 	0x00000000
[  215.881913] nysa_pcie - msi_isr : 	0x00000000
[  215.881915] nysa_pcie - msi_isr : 	0x00000000
[  215.881916] nysa_pcie - msi_isr : 	0x00000000
[  215.881918] nysa_pcie - msi_isr : 	0x00000000
[  215.881919] nysa_pcie - msi_isr : 	0x00000000
[  215.881921] nysa_pcie - msi_isr : 	0x00000000
[  215.881922] nysa_pcie - msi_isr : 	0x00000000
[  215.881924] nysa_pcie - msi_isr : 	0x00000000
[  215.881925] nysa_pcie - msi_isr : 	0x00000000
[  215.881927] nysa_pcie - msi_isr : 	0x00000000
[  215.881928] nysa_pcie - msi_isr : 	0x00000000
[  215.881930] nysa_pcie - msi_isr : 	0x00000000
[  215.881932] nysa_pcie - msi_isr : 	0x00000000
[  215.881933] nysa_pcie - msi_isr : 	0x00000000
[  215.881986] pcieport 0000:00:01.0: AER: Uncorrected (Fatal) error received: id=0010
[  215.881999] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0008(Receiver ID)
[  215.882002] pcieport 0000:00:01.0:   device [10de:0fae] error status/mask=00040000/00000000
[  215.882006] pcieport 0000:00:01.0:    [18] Malformed TLP          (First)
[  215.882010] pcieport 0000:00:01.0:   TLP Header: 6000000f 000000ff 00000000 80000000
[  215.882019] pcieport 0000:00:01.0: broadcast error_detected message
[  215.882023] nysa_pcie 0000:01:00.0: device has no AER-aware driver
[  216.005698] in llseek
[  216.008160] nysa_pcie - nysa_pcie_llseek : disable command mode!
[  216.110147] pcieport 0000:00:01.0: Root Port link has been reset
[  216.116343] pcieport 0000:00:01.0: AER: Device recovery failed

Here is what this means:
6000000f

  • Memory Write TLP
  • 64-bit
  • 15 32-bit values will follow this header

000000ff

  • The last byte is composed of a byte mask for the first and last words, it basically means that all bytes are valid

00000000
80000000

  • Memory Address

I get the same error as before. ‘Malformed TLP’ This is interesting. Since I know the structure of the packet is correct it seems that there is an error due to the address.

Attempting to write data
Even though the transfer of data from the FPGA to the host didn’t work I decided to try the reverse anyways:

I restarted the board, loaded the driver and attempted to write data to the FPGA. It froze so I had to cancel it using the sysfs flag, here is the entire dmesg (install and attempt to write)

[  199.427058] nysa_pcie - nysa_pcie_init : Registering Driver
[  199.432738] nysa_pcie - construct_pcie_ctr : Create PCIE Control
[  199.438848] nysa_pcie - construct_pcie_ctr : Creating space for 1 devices
[  199.445748] nysa_pcie - nysa_pcie_probe : Found PCI Device: 10EE:7011 0000:01:00.0
[  199.453387] nysa_pcie - construct_pcie_device : Entered
[  199.458646] nysa_pcie - construct_pcie_device : Get Device at index: 0 (Major: 232 Minor: 0)
[  199.467185] nysa_pcie - construct_pcie_device : Got Device
[  199.472718] nysa_pcie - construct_pcie_device : Enable PCIE Device
[  199.478934] PCI: enabling device 0000:01:00.0 (0140 -> 0142)
[  199.484649] nysa_pcie - construct_pcie_device : Allow PCIE Device to be a master
[  199.492062] nysa_pcie - construct_pcie_device : Set DMA Mask to 32 bits
[  199.498797] nysa_pcie - construct_pcie_device : Get the start of the base address register
[  199.507080] nysa_pcie - construct_pcie_device : Get the Length of the Base Address Register
[  199.515489] nysa_pcie - construct_pcie_device : Get Base Address
[  199.521506] nysa_pcie - construct_pcie_device : BAR Address: 0x13000000
[  199.528241] nysa_pcie - construct_pcie_device : BAR Length: 0x20000
[  199.534532] nysa_pcie - construct_pcie_device : Get Virtual Address
[  199.540895] nysa_pcie - construct_pcie_device : Virtual Address: 0xFFFFFF8007800000
[  199.548572] nysa_pcie - construct_pcie_device : Request Memory Region
[  199.555106] nysa_pcie - construct_pcie_device : Enable MSI Interrupts
[  199.561698] nysa_pcie - construct_pcie_device : Create the buffers for DMA
[  199.568616] nysa_pcie - construct_pcie_device : Status Buf Addr: ffffffc0fb37f000 : DMA Buf Addr : 0000000080000000
[  199.579130] nysa_pcie - construct_pcie_device : Write Buf [0] Addr: ffffffc0f8f92000 : DMA Buf Addr : 0000000080002000
[  199.589953] nysa_pcie - construct_pcie_device : Write Buf [1] Addr: ffffffc0fc14b000 : DMA Buf Addr : 0000000080004000
[  199.603465] nysa_pcie - construct_pcie_device : Read Buf [0] Addr: ffffffc0fc14a000 : DMA Buf Addr : 0000000080006000
[  199.616848] nysa_pcie - construct_pcie_device : Read Buf [1] Addr: ffffffc0fca37000 : DMA Buf Addr : 0000000080008000
[  199.630342] nysa_pcie - construct_pcie_device : Create Device File
[  199.639434] nysa_pcie - construct_pcie_device : Created Device File
[  199.645729] nysa_pcie - construct_pcie_device : Create kfifo
[  199.651440] nysa_pcie - construct_pcie_device : kfifo created
[  199.657209] nysa_pcie - construct_pcie_device : initialized kfifo
[  199.663662] platform d.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.663968] platform c9.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.664119] platform ca.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.664275] platform cb.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.664426] platform cc.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.664577] platform cd.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.664709] reg-fixed-sync-voltage ce.regulator: Consumer c0 does not have device name
[  199.664737] platform ce.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.664884] platform d1.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.665035] platform d3.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.665165] [OV5693]: probing v4l2 sensor.
[  199.665325] ov5693 6-0036: camera_common_regulator_get vana ERR: fffffffffffffdfb
[  199.665347] ov5693 6-0036: camera_common_regulator_get vif ERR: fffffffffffffdfb
[  199.665378] i2c 6-0036: Driver ov5693 requests probe deferral
[  199.665471] reg-fixed-sync-voltage 5.regulator: Consumer c1 does not have device name
[  199.665492] platform 5.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.665649] platform 7.regulator: Driver reg-fixed-sync-voltage requests probe deferral
[  199.663450] nysa_pcie - nysa_pcie_init : Driver Initialized, waiting for probe...
[  236.933145] nysa_pcie - nysa_pcie_open : Opened!
[  236.942895] nysa_pcie - nysa_pcie_open : Minor Number: 0
[  236.950519] nysa_pcie - nysa_pcie_llseek : in llseek
[  236.961205] nysa_pcie - nysa_pcie_llseek : enable command mode!
[  236.972703] nysa_pcie - nysa_pcie_write : Command Mode!
[  236.983469] nysa_pcie - nysa_pcie_write : Write: 0x00000081 0x00000005 0x00000000
[  236.996542] nysa_pcie - nysa_pcie_write : Not Command Mode!
[  237.007647] nysa_pcie - write_command : Writting Command: Addr: 0x00000081 Data: 0x00000005 Device Addrss: 0x00000000
[  237.023800] nysa_pcie - nysa_pcie_llseek : in llseek
[  237.034571] nysa_pcie - nysa_pcie_llseek : disable command mode!
[  237.046603] nysa_pcie - nysa_pcie_write : Write Data: Count: 0x00000014
[  237.058763] nysa_pcie - nysa_pcie_write_data : Write Data
[  237.069112] nysa_pcie - nysa_pcie_write_data : Prepare Buffers
[  237.075029] nysa_pcie - nysa_pcie_write_data : Copy over 20 bytes from user buffer to buffer 0 at offset 0x00000000
[  260.316111] nysa_pcie - write_command : 
[  260.322642] nysa_pcie - nysa_pcie_write_data : Writting Command: Addr: 0x00000080 Data: 0x00000000 Device Addrss: 0x00000000
[  260.335741] pcieport 0000:00:01.0: AER: Uncorrected (Fatal) error received: id=0010
[  260.343509] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0008(Receiver ID)
[  260.343592] A semaphore woke us up but there was no data in KFIFO!?
[  260.361430] pcieport 0000:00:01.0:   device [10de:0fae] error status/mask=00040000/00000000
[  260.372655] pcieport 0000:00:01.0:    [18] Malformed TLP          (First)
[  260.382075] pcieport 0000:00:01.0:   TLP Header: 20000080 01000000 00000000 80002000
[  260.390072] pcieport 0000:00:01.0: broadcast error_detected message
[  260.399087] nysa_pcie 0000:01:00.0: device has no AER-aware driver
[  260.634776] pcieport 0000:00:01.0: Root Port link has been reset
[  260.640961] pcieport 0000:00:01.0: AER: Device recovery failed

I’m not sure if I’m doing something wrong with the kernel build but I received a few kernel oops while everything loads The system is stable but it’s not good.

It’s not the case when I use the Jetpack 2.2 direct install. the dmesg is completely clean.

Does the 23.1 kernel with the 32-bit and 64-bit have these same issues?

Dave

I was able to have FPGA cards from 2 different vendors work for TX1 under R24.1. Both cards tested on Intel processor for > 700MB/s but only 200 MB/s on TX1.

TX1 was advertised with performances compared with Intel i5/i7 processor, but so far test results have been disappointing.

I also noticed one CPU usage of >60% when performing DMAs on TX1. I was wondering if TX1 memory controller is doing something very inefficiently, e.g., doing “memcpy” type of operations, or PCIe/memory performance may be constrained by TX1 OS.

Hi yahoo2016,
Can you please give us the details of the cards and the procedure you used to measure the perf?

TX1 does give around 680 MB/s write speed ( https://devtalk.nvidia.com/default/topic/940605/pcie-x4-speed-issue-with-ssd/#4929211 )

Hello,

for reference, I have an IP and Linux SGDMA driver implementation that achieves the following throughput on Gen2 x4:

FPGA to CPU performance is 981 Megabytes/s
CPU to FPGA performance is 1115 Megabytes/s

This is tested with an Altera Cyclone V GT FPGA. This performance is measured after running a script that increases the TX1 clock frequencies.

Regards,

Leon.

There is standard “simpleDma” app we have used many years for different CPUs:
http://www.alpha-data.com/esp/softwareg3.php
The card we tested on TX1 is Virtex 7 based adm-xrc-7v1
http://www.alpha-data.com/esp/products.php?product=adm-xrc-7v1
on Technobox XMC to PCIe adaptor.

The same cards were tested on Intel PC (HPWX8600).

Another card is:

https://www.enterpoint.co.uk/products/artix-7-development-boards/raggedstone-5/

FPGA timer was used to time DMA transfers of 4MB frames.

The same test software was used on TX1 and Intel PC.

On PC:

Reading using DMA channel 0 at OCP address 0x0...

Measuring throughput...
Throughput from host to FPGA is 663.7 MiB/s
Throughput from FPGA to host is 663.0 MiB/s
PASSED

On TX1

ubuntu@tegra-ubuntu:~/AlphaData/admxrcg3sdk-1.7.0/apps/linux$ sudo simpledma/simpledma
Reading using DMA channel 0 at OCP address 0x0...

Measuring throughput...                 
Throughput from host to FPGA is 185.8 MiB/s
Throughput from FPGA to host is 191.2 MiB/s
PASSED