NVMe triggers IOMMU faults on TX2

Hi, we are operating a cluster of 4 TX2’s, each node is of same composition: TX2 with nvidia ITX devkit board + our PLX switch (8624) + NVMe SSD (Samsung 960 Pro). There is also our FPGA based camera attached to PLX which captures video through V4L2 drivers and a user-space application makes a file per each frame. Total data rate is 680 MB/s.

When we start recording the frames, in random interval, the NVMe driver is kicked out as a response to the IOMMU protection violation. This happens only when concurrent V4L2 capture and NVMe WRITE is happening. If the NVMe is only read, there is no fault happening. Also I could not trigger it with write only when camera was stopped - likely the iommu is not so busy at that time.

When running on old TX1 node (with 3.10. kernel), everything works without issues for hours of recording (12 min, then erase and do again).

This appears on both R27.1 and R28.1 with the latest L4T kernel.

The faults from various runs at various nodes are all similar:

[  362.090058] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x83829f00, fsynr=0x60003, cb=22, sid=17(0x11 - AFI), pgd=270064003, pud=270064003, pmd=1cf0c9003, pte=0
[  288.945693] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x842a1e80, fsynr=0x20003, cb=22, sid=17(0x11 - AFI), pgd=270061003, pud=270061003, pmd=2170ae003, pte=0
[  252.095736] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x839fbe00, fsynr=0x60003, cb=22, sid=17(0x11 - AFI), pgd=2754c5003, pud=2754c5003, pmd=1cc4d8003, pte=0
[  247.646129] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x8185ff80, fsynr=0x1a0003, cb=22, sid=17(0x11 - AFI), pgd=270068003, pud=270068003, pmd=21bc7d003, pte=0
[  494.143285] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x8398fe80, fsynr=0x240003, cb=22, sid=17(0x11 - AFI), pgd=27005f003, pud=27005f003, pmd=1cd530003, pte=0
[  533.487312] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x892e3880, fsynr=0x240003, cb=22, sid=17(0x11 - AFI), pgd=270060003, pud=270060003, pmd=1d3b23003, pte=0
[  143.097750] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x840d5e80, fsynr=0x240003, cb=22, sid=17(0x11 - AFI), pgd=270077003, pud=270077003, pmd=21c185003, pte=0
[  262.925572] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x83f3be00, fsynr=0x80003, cb=21, sid=17(0x11 - AFI), pgd=26f87c003, pud=26f87c003, pmd=21ab5e003, pte=0
[  279.781618] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x83f3be80, fsynr=0x80003, cb=21, sid=17(0x11 - AFI), pgd=26f87c003, pud=26f87c003, pmd=21ab5e003, pte=0
[ 4710.237528] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x83473f80, fsynr=0x240003, cb=22, sid=17(0x11 - AFI), pgd=270068003, pud=270068003, pmd=20ec53003, pte=0

This is how the NVMe gets lost / kicked out - and then no longer listed in lspci either. Swapping the CPU module in this system for TX1 and it works perfect.

[  142.761669] nvme 0000:03:00.0: Failed status: 3, reset controller
[  142.827824] nvme 0000:03:00.0: Cancelling I/O 806 QID 4
[  143.097750] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x840d5e80, fsynr=0x240003, cb=22,
[  143.404216] irq 55: nobody cared (try booting with the "irqpoll" option)
[  143.477523] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  O    4.4.38+ #5
[  143.563966] Hardware name: quill (DT)
[  143.607710] Call trace:
[  143.636880] [<ffffffc000089a10>] dump_backtrace+0x0/0xe8
[  143.700411] [<ffffffc000089b0c>] show_stack+0x14/0x20
[  143.760825] [<ffffffc0003bfbd0>] dump_stack+0xa0/0xc8
[  143.821235] [<ffffffc0000f6a38>] __report_bad_irq+0x38/0xe8
[  143.887900] [<ffffffc0000f6dc8>] note_interrupt+0x210/0x2f0
[  143.954573] [<ffffffc0000f4204>] handle_irq_event_percpu+0x224/0x2a0
[  144.030623] [<ffffffc0000f42c8>] handle_irq_event+0x48/0x78
[  144.097249] [<ffffffc0000f7860>] handle_fasteoi_irq+0xb8/0x1b0
[  144.167033] [<ffffffc0000f367c>] generic_handle_irq+0x24/0x38
[  144.235777] [<ffffffc0000f397c>] __handle_domain_irq+0x5c/0xb8
[  144.305564] [<ffffffc0000815b8>] gic_handle_irq+0x68/0xf0
[  144.370143] [<ffffffc000084740>] el1_irq+0x80/0xf8
[  144.427429] [<ffffffc0000a70d0>] irq_exit+0x88/0xe0
[  144.485752] [<ffffffc0000f3980>] __handle_domain_irq+0x60/0xb8
[  144.555541] [<ffffffc0000815b8>] gic_handle_irq+0x68/0xf0
[  144.620119] [<ffffffc000084740>] el1_irq+0x80/0xf8
[  144.677405] [<ffffffc0000c7fc0>] finish_task_switch+0xa8/0x1f8
[  144.747207] [<ffffffc000c13b4c>] __schedule+0x274/0x7a0
[  144.749672] nvme 0000:03:00.0: Failed status: 3, reset controller
[  144.749720] nvme 0000:03:00.0: Cancelling I/O 1 QID 0
[  144.943019] [<ffffffc000c140bc>] schedule+0x44/0xb8
[  145.001350] [<ffffffc000c14588>] schedule_preempt_disabled+0x20/0x40
[  145.077369] [<ffffffc0000e340c>] cpu_startup_entry+0xfc/0x340
[  145.146111] [<ffffffc000c12150>] rest_init+0x88/0x98
[  145.205482] [<ffffffc001167978>] start_kernel+0x39c/0x3b0
[  145.270062] [<0000000080c19000>] 0x80c19000
[  145.320051] handlers:
[  145.347133] [<ffffffc0009679d8>] tegra_mcerr_hard_irq threaded [<ffffffc000967a20>] tegra_mcerr_threa
[  145.465523] Disabling IRQ #55
[  145.494185] (255) csr_afir: EMEM address decode error
[  145.554470]   status = 0x2032700e; addr = 0x3ffffffc0
[  145.554595] nvme 0000:03:00.0: Device failed to resume
[  145.554673] blk_update_request: I/O error, dev nvme0n1, sector 452417536
[  145.554763] Aborting journal on device nvme0n1-8.
[  145.554768] Buffer I/O error on dev nvme0n1, logical block 62423040, lost sync page write
[  145.554770] JBD2: Error -5 detected when updating journal superblock for nvme0n1-8.
[  145.554796] Buffer I/O error on dev nvme0n1, logical block 0, lost sync page write
[  145.554800] EXT4-fs error (device nvme0n1): ext4_journal_check_start:56: Detected aborted journal
[  145.554803] EXT4-fs (nvme0n1): Remounting filesystem read-only
[  145.554804] EXT4-fs (nvme0n1): previous I/O error to superblock detected
[  145.554807] Buffer I/O error on dev nvme0n1, logical block 0, lost sync page write
[  145.554982] Buffer I/O error on dev nvme0n1, logical block 1, lost async page write
[  145.554988] Buffer I/O error on dev nvme0n1, logical block 1041, lost async page write
[  145.554992] Buffer I/O error on dev nvme0n1, logical block 1057, lost async page write
[  145.554996] Buffer I/O error on dev nvme0n1, logical block 9249, lost async page write
[  146.815802]   secure: yes, access-type: read
[  146.869748] Trying to vfree() nonexistent vm area (ffffff8000378000)
[  146.942890] ------------[ cut here ]------------
[  146.998063] WARNING: at ffffffc0001b0560 [verbose debug info unavailable]
[  147.079300] Modules linked in: bridge stp llc imx183(O) sdma(O) snd_soc_spdif_tx snd_soc_core snd_com

[  147.279819] CPU: 4 PID: 2171 Comm: nvme0 Tainted: G        W  O    4.4.38
[  147.359019] Hardware name: quill (DT)
[  147.402762] task: ffffffc0610d3e80 ti: ffffffc1f0138000 task.ti: ffffffc1
[  147.492344] PC is at __vunmap+0xe0/0xe8
[  147.538185] LR is at __vunmap+0xe0/0xe8
[  147.584018] pc : [<ffffffc0001b0560>] lr : [<ffffffc0001b0560>] pstate: 6
[  147.672532] sp : ffffffc1f013bca0
[  147.712107] x29: ffffffc1f013bca0 x28: 0000000000000000
[  147.782586] x27: 0000000000000000 x26: 0000000000000000
[  147.846156] x25: 0000000000000000 x24: 0000000000000000
[  147.909730] x23: ffffffc000692a10 x22: ffffffc1f0059400
[  147.973305] x21: 0000000000000000 x20: 0000000000000000
[  148.036879] x19: ffffff8000378000 x18: 0000000000000000
[  148.100454] x17: 0000007f803d91d8 x16: ffffffc00011b5d8
[  148.164028] x15: 0000000000000010 x14: 0a29303030383733
[  148.227598] x13: 3030303866666666 x12: 6666282061657261
[  148.291173] x11: 206d7620746e6574 x10: 736978656e6f6e20
[  148.354747] x9 : 2928656572667620 x8 : 0000000000000552
[  148.418322] x7 : 0000000000000040 x6 : 0000000000000004
[  148.481897] x5 : ffffffc0610d3ee0 x4 : 0000000000000000
[  148.545472] x3 : 0000000000000002 x2 : 0000000000000000
[  148.609046] x1 : 0000000000000000 x0 : 0000000000000038

[  148.689628] ---[ end trace 21d29f72bdecabc8 ]---
[  148.738624] Call trace:
[  148.767787] [<ffffffc0001b0560>] __vunmap+0xe0/0xe8
[  148.826116] [<ffffffc0001b0690>] vunmap+0x28/0x38
[  148.882362] [<ffffffc00009c6d4>] __iounmap+0x34/0x40
[  148.941732] [<ffffffc000692a84>] nvme_dev_unmap.isra.27+0x1c/0x38
[  149.014640] [<ffffffc0006946d8>] nvme_remove+0xd8/0x110
[  149.077137] [<ffffffc000415abc>] pci_device_remove+0x3c/0x108
[  149.145898] [<ffffffc000623580>] __device_release_driver+0x80/0xf0
[  149.219847] [<ffffffc000623614>] device_release_driver+0x24/0x38
[  149.291698] [<ffffffc00040ea08>] pci_stop_bus_device+0x98/0xa8
[  149.361481] [<ffffffc00040eb44>] pci_stop_and_remove_bus_device_locked+0x
[  149.450019] [<ffffffc000692a34>] nvme_remove_dead_ctrl+0x24/0x58
[  149.521886] [<ffffffc0000c085c>] kthread+0xdc/0xf0
[  149.579170] [<ffffffc000084f90>] ret_from_fork+0x10/0x40
[  149.642800] Trying to free nonexistent resource <0000000050100000-0000000
[  149.734456] iommu: Removing device 0000:03:00.0 from group 65

Hi danieel,

How to reproduce your problem? Does this only happen with nvme PCIE ssd? How about other kind of SSD?
In your video capture process, is there any application after then? video convert, encode, preview,…,etc.

Hi WayneWWW, this happens also with AHCI M2 SSD (Samsung XP941):

[  275.472244] ata1.00: exception Emask 0x20 SAct 0x18 SErr 0x0 action 0x6 frozen
[  275.479465] ata1.00: irq_stat 0x20000000, host bus error
[  275.484781] ata1.00: failed command: WRITE FPDMA QUEUED
[  275.490015] ata1.00: cmd 61/00:18:00:88:2c/40:00:1d:00:00/40 tag 3 ncq 8388608 out
                        res 40/00:20:00:c8:2c/00:00:1d:00:00/40 Emask 0x20 (host bus error)
[  275.505649] ata1.00: status: { DRDY }
[  275.509313] ata1.00: failed command: WRITE FPDMA QUEUED
[  275.514539] ata1.00: cmd 61/10:20:00:c8:2c/1a:00:1d:00:00/40 tag 4 ncq 3416064 out
                        res 40/00:20:00:c8:2c/00:00:1d:00:00/40 Emask 0x20 (host bus error)
[  275.530164] ata1.00: status: { DRDY }
[  275.533832] ata1: hard resetting link
[  275.864247] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  275.888227] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x100)
[  275.894487] ata1.00: revalidation failed (errno=-5)
[  280.868224] ata1: hard resetting link
[  281.196247] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  281.220224] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x100)
[  281.226484] ata1.00: revalidation failed (errno=-5)
[  281.231368] ata1: limiting SATA link speed to 3.0 Gbps
[  286.200242] ata1: hard resetting link
[  286.532255] ata1: SATA link down (SStatus 0 SControl 320)
[  286.537678] ata1.00: disabled
[  286.541390] sd 0:0:0:0: [sda] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[  286.549941] sd 0:0:0:0: [sda] tag#3 Sense Key : 0x5 [current] [descriptor] 
[  286.556916] sd 0:0:0:0: [sda] tag#3 ASC=0x21 ASCQ=0x4 
[  286.562074] sd 0:0:0:0: [sda] tag#3 CDB: opcode=0x2a 2a 00 1d 2c 88 00 00 40 00 00
[  286.569646] blk_update_request: I/O error, dev sda, sector 489457664
[  286.576047] sd 0:0:0:0: rejecting I/O to offline device
[  286.581267] sd 0:0:0:0: [sda] killing request

The 950 Pro and 960 Pro NVMe SSDs show this:

[  766.753809] nvme 0000:03:00.0: Failed status: 3, reset controller

The status shows the NVMe CSTS (controller status register), with bits RDY (0x01) and CFS (0x02) - controller fatal status set. The driver in the TX1 (3.10 kernel) does not check/work with this bit at all (and NVMe driver is structured differently).

We have also applied this NVMe patch: https://lkml.org/lkml/2017/11/22/219 (as it is present int 4.4.102 kernel), but it did not resolve the issue.

The reproduceability is hard - the issue does not show up when our PCIe camera is not running / capturing data through V4L2 (e.g. with iperf on 10GE network card + writing to NVMe). Also, when we limit our framerate to 30fps and not using full 60fps, the crash happens after more than 10 minutes. There is less interrupts (one per frame) and less data (but that flows in opposite direction than the files written to SSD, so that should not render the device starving).

There is no encode happening. The chain is: Camera->PCIe-V4L2->application->CUDA->OpenGL which gets our data onto the screen, and RAW video buffers from V4L2 are written with O_DIRECT to ext4 on NVMe drive. But the crash happens also when this viewer app is running and we run an independent dd if=zero of=/mnt/nvme/test bs=1M process, so that the issue is not with the data buffers being shared.

I suspect this is maybe interrupt related, we are seeing these errors - the mc_status interrupt 55 gets crashing by the way:

[  145.320051] handlers:
[  145.347133] [<ffffffc0009679d8>] tegra_mcerr_hard_irq threaded [<ffffffc000967a20>] tegra_mcerr_thread
[  145.465523] Disabling IRQ #55
[  145.494185] (255) csr_afir: EMEM address decode error
[  145.554470]   status = 0x2032700e; addr = 0x3ffffffc0

The IOMMU unhandled context faults are always related to NVMe mappings. Disabling IOMMU for PCIe (removing AFI sub-node) does not help, we are still getting the same “status 3” messages and EMEM address decode errors.

A similar traffic related issue (yet unsolved) is to be seen here: https://forum.rocketboards.org/t/altera-pcie-driver-issue-with-ssd-devices/545/4

Hi danieel,

My comment,

  • TX1 node with 3.10. kernel is OK. You can also try to run TX1 with r28.1 BSP to see if the issue exists. This way you can tell is it related to BSP/kernel sw only or could also be platform relevant.
  • Similar issue under,
    https://forum.rocketboards.org/t/altera-pcie-driver-issue-with-ssd-devices/545/4
    is kernel 4.1 (similar to kernel 4.4 than 3.10) and point the issue to more kernel related but yet to be confirmed.
  • "The status shows the NVMe CSTS (controller status register), with bits RDY (0x01) and CFS (0x02) - controller fatal status set. The driver in the TX1 (3.10 kernel) does not check/work with this bit at all (and NVMe driver is structured differently).
    => a quick experiments,
    for tx1/k3.10, you could add the checking code to simply dump the bits status but let normal op proceed. Just to confirm this is running at 60fps operation and everything is normal.
  • "when we limit our framerate to 30fps and not using full 60fps, the crash happens after more than 10 minutes.
    => is this very consistent? Meaning if you repeat a few times, the behavior is similar?
  • seems the issue is related to system loading or interrupt which triggers exception for some reason.

Hi danieel,

Let’s narrow down the case when error happened. Please correct me if my understanding is wrong.

Your usecase is a using a PCIe camera with 2 pipeline: to display and to SSD with NVMe or AHCI M2. (60fps)

Could this be reproduced when only the SSD pipeline being launched?

How about the 10GE network card? Is it also needed to reproduce error?

Have you tried disabling SMMU for PCIe? If not, it is worth giving it a try.

On TX1 with R28.1 system and 4.4.38 nvidia kernel, we got this SMMU fault:

[  612.030983] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[  612.031015] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[  612.031020] mc-err:   status = 0x6000000e; addr = 0x00000000
[  612.031029] mc-err:   secure: no, access-type: read, SMMU fault: nr-nw-s
[  612.031359] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[  612.031371] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[  612.031375] mc-err:   status = 0x6000000e; addr = 0x00000000
[  612.031380] mc-err:   secure: no, access-type: read, SMMU fault: nr-nw-s
[  612.031401] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[  612.031405] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[  612.031409] mc-err:   status = 0x6000000e; addr = 0x00000000
[  612.031413] mc-err:   secure: no, access-type: read, SMMU fault: nr-nw-s
[  612.031430] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[  612.031434] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[  612.031438] mc-err:   status = 0x6000000e; addr = 0x00000000
[  612.031442] mc-err:   secure: no, access-type: read, SMMU fault: nr-nw-s
[  612.031455] mc-err: Too many MC errors; throttling prints

Does that EMEM decode error on PDE or PTE entry mean that the peripheral is accessing the address in question, or the SMMU itself was trying to resolve a multi-level translation table?

Hi danieel,

Please try following patch to disable SMMU.

diff --git a/kernel-dts/tegra186-soc/tegra186-soc-base.dtsi b/kernel-dts/tegra186-soc/tegra186-soc-base.dtsi
index 5c6536b968ab..da6eee63670e 100644
--- a/kernel-dts/tegra186-soc/tegra186-soc-base.dtsi
+++ b/kernel-dts/tegra186-soc/tegra186-soc-base.dtsi
@@ -186,7 +186,6 @@
                   <&tegra_adsp_audio    TEGRA_SID_APE>,
                   <&{/sound}        TEGRA_SID_APE>,
                   <&{/sound_ref}        TEGRA_SID_APE>,
-                  <&{/pcie-controller@10003000} TEGRA_SID_AFI>,
                   <&{/ahci-sata@3507000}    TEGRA_SID_SATA2>,
                   <&{/aon@c160000}      TEGRA_SID_AON>,
                   <&{/rtcpu@b000000}    TEGRA_SID_RCE>,
@@ -1509,8 +1508,6 @@
         interrupt-map-mask = <0 0 0 0>;
         interrupt-map = <0 0 0 0 &intc 0 72 0x04>;// check this
 
-        #stream-id-cells = <1>;
-
         bus-range = <0x00 0xff>;
         #address-cells = <3>;
         #size-cells = <2>;

It means that there is an access to 0x0000000000000000 address by the respective IP. In this case PCIe.

Hi WayneWWW, with disabled IOMMU on TX2 according to your diff, we get this fault:

[  243.762393] nvme 0000:03:00.0: Failed status: 3, reset controller
[  243.762504] nvme 0000:03:00.0: Cancelling I/O 258 QID 2
[  243.762514] nvme 0000:03:00.0: Cancelling I/O 259 QID 2
[  244.762434] nvme 0000:03:00.0: Failed status: 3, reset controller
[  244.762476] nvme 0000:03:00.0: Cancelling I/O 1 QID 0
[  244.762572] nvme 0000:03:00.0: Device failed to resume
[  244.762643] blk_update_request: I/O error, dev nvme0n1, sector 173578240
[  244.762651] blk_update_request: I/O error, dev nvme0n1, sector 173576192
[  244.763213] Aborting journal on device nvme0n1-8.
[  244.763222] Buffer I/O error on dev nvme0n1, logical block 62423040, lost sync page write
[  244.763226] JBD2: Error -5 detected when updating journal superblock for nvme0n1-8.
[  244.781194] Trying to vfree() nonexistent vm area (ffffff801260c000)
[  244.781205] ------------[ cut here ]------------
[  244.781208] WARNING: at ffffffc0001b0560 [verbose debug info unavailable]
[  244.781210] Modules linked in: nvme imx183(O) sdma(O) snd_soc_spdif_tx snd_soc_core ixgbe snd_compress bcmdhd snd_pcm snd_timer snd soundcore ahci_tegra libahci_platform libahci bluedroid_pm [last unloaded: nvme]

[  244.781237] CPU: 4 PID: 2441 Comm: nvme0 Tainted: G           O    4.4.38+ #12
[  244.781240] Hardware name: quill (DT)
[  244.781243] task: ffffffc1e6697080 ti: ffffffc1e67f8000 task.ti: ffffffc1e67f8000
[  244.781249] PC is at __vunmap+0xe0/0xe8
[  244.781252] LR is at __vunmap+0xe0/0xe8
[  244.781255] pc : [<ffffffc0001b0560>] lr : [<ffffffc0001b0560>] pstate: 60000045
[  244.781257] sp : ffffffc1e67fbca0
[  244.781259] x29: ffffffc1e67fbca0 x28: 0000000000000000
[  244.781263] x27: 0000000000000000 x26: 0000000000000000
[  244.781266] x25: 0000000000000000 x24: 0000000000000000
[  244.781269] x23: ffffffbffc0a1198 x22: ffffffc06f83a000
[  244.781272] x21: 0000000000000000 x20: 0000000000000000
[  244.781274] x19: ffffff801260c000 x18: 0000000000000000
[  244.781277] x17: 0000000000000007 x16: 0000000000000001
[  244.781280] x15: 0000000000000010 x14: ffffffc081449297
[  244.781282] x13: ffffffc0014492a5 x12: 0000000000000006
[  244.781285] x11: 0000000000035c7c x10: 0000000005f5e0ff
[  244.781288] x9 : ffffffc1e67fba20 x8 : 0000000000035c7d
[  244.781291] x7 : 6666666666282061 x6 : ffffffc0014492df
[  244.781294] x5 : 0000000000000000 x4 : 0000000000000000
[  244.781296] x3 : 0000000000000000 x2 : ffffffc1e67f8000
[  244.781299] x1 : 0000000000000000 x0 : 0000000000000038

[  244.781303] ---[ end trace a4f9aef1c36b3e46 ]---
[  244.781306] Call trace:
[  244.803594] [<ffffffc0001b0560>] __vunmap+0xe0/0xe8
[  244.803599] [<ffffffc0001b0690>] vunmap+0x28/0x38
[  244.803604] [<ffffffc00009c6d4>] __iounmap+0x34/0x40
[  244.803615] [<ffffffbffc0a120c>] nvme_dev_unmap.isra.26+0x1c/0x38 [nvme]
[  244.803623] [<ffffffbffc0a3028>] nvme_remove+0xd0/0x118 [nvme]
[  244.803628] [<ffffffc000417c5c>] pci_device_remove+0x3c/0x108
[  244.803633] [<ffffffc00062ffc8>] __device_release_driver+0x80/0xf0
[  244.803636] [<ffffffc00063005c>] device_release_driver+0x24/0x38
[  244.803640] [<ffffffc000410ba8>] pci_stop_bus_device+0x98/0xa8
[  244.803643] [<ffffffc000410ce4>] pci_stop_and_remove_bus_device_locked+0x1c/0x38
[  244.803651] [<ffffffbffc0a11bc>] nvme_remove_dead_ctrl+0x24/0x58 [nvme]
[  244.803656] [<ffffffc0000c085c>] kthread+0xdc/0xf0
[  244.803659] [<ffffffc000084f90>] ret_from_fork+0x10/0x40
[  244.804516] Trying to free nonexistent resource <0000000050100000-0000000050103fff>

The result here is that TX2 PCIe host stops responding to reads. We are not sure what condition in the Samsung NVMe triggers the CFS bit to be set, but we put a LED indicator on our FPGA, to show whether a completion comes for a read initiated by it. And we now clearly see that it does not come when the crash occurs.

I would replace PCIe in that sentence with SMMU. Since the drivers/platform/tegra/mc/mcerr.c lists this fault as internal to the SMMU:

//
	/*
	 * SMMU related faults.
	 */
	MC_ERR(MC_INT_INVALID_SMMU_PAGE,
	       "SMMU address translation fault",
	       E_SMMU, MC_ERR_STATUS, MC_ERR_ADR),
	MC_ERR(MC_INT_INVALID_SMMU_PAGE | MC_INT_DECERR_EMEM,
	       "EMEM decode error on PDE or PTE entry",
	       E_SMMU, MC_ERR_STATUS, MC_ERR_ADR),
	MC_ERR(MC_INT_INVALID_SMMU_PAGE | MC_INT_SECERR_SEC,
	       "secure SMMU address translation fault",
	       E_SMMU, MC_ERR_SEC_STATUS, MC_ERR_SEC_ADR),
	MC_ERR(MC_INT_INVALID_SMMU_PAGE | MC_INT_DECERR_VPR,
	       "VPR SMMU address translation fault",
	       E_SMMU, MC_ERR_VPR_STATUS, MC_ERR_VPR_ADR),
	MC_ERR(MC_INT_INVALID_SMMU_PAGE | MC_INT_DECERR_VPR |
	       MC_INT_DECERR_EMEM,
	       "EMEM decode error on PDE or PTE entry on VPR context",
	       E_SMMU, MC_ERR_VPR_STATUS, MC_ERR_VPR_ADR),

So the result here is: TX1 SMMU crashes while it is trying to figure out if some transaction shall pass or not.

TX1 (4.4.38, R28.1) without AFI specified in IOMMU portion of devicetree crashes on this error:

[  785.811564] nvme 0000:03:00.0: Failed status: 3, reset controller
[  785.811674] nvme 0000:03:00.0: Cancelling I/O 863 QID 2
[  785.811695] nvme 0000:03:00.0: Cancelling I/O 865 QID 2
[  785.915656] smmu_dump_pagetable(): fault_address=0x0000000080de3e00 pa=0x0000000000000e00 bytes=1000 #pte=764 in L2
[  785.915666] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[  785.915670] mc-err:   status = 0x6000000e; addr = 0x80de3e00
[  785.915675] mc-err:   secure: no, access-type: read, SMMU fault: nr-nw-s
[  785.915693] smmu_dump_pagetable(): fault_address=0x0000000080de3e00 pa=0x0000000000000e00 bytes=1000 #pte=764 in L2
[  785.915697] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[  785.915701] mc-err:   status = 0x6000000e; addr = 0x80de3e00
[  785.915705] mc-err:   secure: no, access-type: read, SMMU fault: nr-nw-s
[  785.915718] smmu_dump_pagetable(): fault_address=0x0000000080de3e00 pa=0x0000000000000e00 bytes=1000 #pte=764 in L2
[  785.915723] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[  785.915726] mc-err:   status = 0x6000000e; addr = 0x80de3e00
[  785.915730] mc-err:   secure: no, access-type: read, SMMU fault: nr-nw-s
[  785.915744] smmu_dump_pagetable(): fault_address=0x0000000080de3e00 pa=0x0000000000000e00 bytes=1000 #pte=764 in L2
[  785.915748] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[  785.915751] mc-err:   status = 0x6000000e; addr = 0x80de3e00
[  785.915755] mc-err:   secure: no, access-type: read, SMMU fault: nr-nw-s
[  785.915765] mc-err: Too many MC errors; throttling prints
[  786.799547] nvme 0000:03:00.0: Failed status: 3, reset controller

The address 0x80de3e00 does not belong to our V4L2 driver (anyway by the nature of the error message it seems to be internal to SMMU)

Turning the SMMU back on on the TX1 I have found some IOVA mappings using sysfs. The file /sys/kernel/debug/70019000.iommu/as010/iovainfo shows regions which are supposedly being the mapped ones.

The issue here however is, that the regions shown there are incomplete. That is - I have parsed through the long dmesg output from our driver and identified all pages which are being accessed by the FPGA:

Frame 1 maps 2882 unique pages (11528 KiB) for 2160 lines
:
Frame 16 maps 2882 unique pages (11528 KiB) for 2160 lines
Frame buffers request mapping of 46112 pages from which are 46112 unique (184448 KiB)
Program for DMA processor takes 163 unique pages (652 KiB)

However when I try to match those pages to the iovainfo, only a portion of them actually matches:

IOMMU maps 23272 unique pages (93088 KiB)
Orphaned 89 iommu pages
Orphaned 23041 pages in 16 frames (1395,1579,1140,1476,1466,1674,1475,1447,1314,1520,1464,1494,1246,1502,1432,1417)
Orphaned 51 pages in DMA processor program

With manually checking mappings of some buffers I can see that they are not in the iommu list. But the program still runs and causes no violations, despite the listing of mappings being incomplete. The DMA program area and frame data buffer translations are valid from start of a V4L2 client till the eventual crash.

In this, how did you confirm that SMMU is disabled for PCIe?
Can you please paste the output of ‘ls /sys/kernel/debug/12000000.iommu/masters/’ ?

This was on TX1 (so no 12000000.iommu). When doing this comment:

tx1/Linux_for_Tegra_R28.1/sources/hardware/nvidia/soc/t210/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi

domains = <&ppcs_as TEGRA_SWGROUP_CELLS5(PPCS, PPCS1, PPCS2, SE, SE1)
                           &gpu_as TEGRA_SWGROUP_CELLS(GPUB)
                           &ape_as TEGRA_SWGROUP_CELLS(APE)
                           &dc_as TEGRA_SWGROUP_CELLS2(DC, DC12)
                           &dc_as TEGRA_SWGROUP_CELLS(DCB)
/*
                           &common_as TEGRA_SWGROUP_CELLS(AFI)
*/
                           &common_as TEGRA_SWGROUP_CELLS(SDMMC1A)
                           &common_as TEGRA_SWGROUP_CELLS(SDMMC2A)
                           &common_as TEGRA_SWGROUP_CELLS(SDMMC3A)
                           &common_as TEGRA_SWGROUP_CELLS(SDMMC4A)
                           &common_as TEGRA_SWGROUP_CELLS(AVPC)
                           &common_as TEGRA_SWGROUP_CELLS(SMMU_TEST)
                           &common_as 0xFFFFFFFF 0xFFFFFFFF>;

Then the difference in ‘ls /sys/kernel/debug/70019000.iommu/masters/’ is unexpectedly:

# diff -u ls-tx1-with-iommu.txt ls-tx1-without-iommu.txt
--- ls-tx1-with-iommu.txt       2017-12-04 11:40:52.392434254 +0000
+++ ls-tx1-without-iommu.txt    2017-12-04 11:44:34.291117256 +0000
@@ -37,7 +37,6 @@
 sdhci-tegra.3
 serial8250
 smmu_test
-snd-soc-dummy
 sound
 tegra-carveouts
 tegradc.1

I suspect our TX1/noiommu test results are false then.
How to disable TX1 pcie from iommu?

When we did your comment on TX2 device tree, in dmesg we saw a missing:

[    0.240076] iommu: Adding device 10003000.pcie-controller to group 52

and also the iommu groups were reduced from 0…66 to 0…55 due to not appling iommu rules on any of the device in our PCI tree.

So this is a proper TX1 test with SMMU disabled. Since nobody has replied, I found the way with this:

tx1/Linux_for_Tegra_R28.1/sources/hardware/nvidia/soc/t210/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi

pcie-controller@1003000 {

-       iommus = <&smmu TEGRA_SWGROUP_AFI>;

}

Then confirmed with the non-presence of PCIe between masters:

ls /sys/kernel/debug/*.iommu/masters/ >current
diff ls-tx1-with-iommu.txt current

1d0
< 1003000.pcie-controller

And DMA addresses are also no longer 0x80xx.xxxx based, but they show up as 0xFDxx.xxxx for the kernel allocated dma program descriptors and user buffers are seen as 32+64 bit variations - 0xAxxxx.xxxx or 0x1.4xxx.xxxx

Crash #1

[  819.844999] nvme 0000:03:00.0: Failed status: 3, reset controller
[  819.845120] nvme 0000:03:00.0: Cancelling I/O 319 QID 4
[  819.845135] nvme 0000:03:00.0: Cancelling I/O 320 QID 4
[  819.845143] nvme 0000:03:00.0: Cancelling I/O 321 QID 4
[  819.845150] nvme 0000:03:00.0: Cancelling I/O 322 QID 4
[  819.845156] nvme 0000:03:00.0: Cancelling I/O 323 QID 4
[  819.845164] nvme 0000:03:00.0: Cancelling I/O 689 QID 3
[  819.845173] nvme 0000:03:00.0: Cancelling I/O 692 QID 3
[  819.845182] nvme 0000:03:00.0: Cancelling I/O 693 QID 3
[  819.845191] nvme 0000:03:00.0: Cancelling I/O 694 QID 3
[  819.845201] nvme 0000:03:00.0: Cancelling I/O 695 QID 3
[  819.845209] nvme 0000:03:00.0: Cancelling I/O 696 QID 3
[  819.845217] nvme 0000:03:00.0: Cancelling I/O 697 QID 3
[  819.845225] nvme 0000:03:00.0: Cancelling I/O 698 QID 3
[  819.845234] nvme 0000:03:00.0: Cancelling I/O 699 QID 3
[  819.845243] nvme 0000:03:00.0: Cancelling I/O 700 QID 3
[  819.845252] nvme 0000:03:00.0: Cancelling I/O 701 QID 3
[  819.845261] nvme 0000:03:00.0: Cancelling I/O 702 QID 3
[  819.845268] nvme 0000:03:00.0: Cancelling I/O 703 QID 3
[  819.845278] nvme 0000:03:00.0: Cancelling I/O 704 QID 3
[  819.845287] nvme 0000:03:00.0: Cancelling I/O 705 QID 3
[  819.845295] nvme 0000:03:00.0: Cancelling I/O 706 QID 3
[  819.845305] nvme 0000:03:00.0: Cancelling I/O 707 QID 3
[  819.845313] nvme 0000:03:00.0: Cancelling I/O 708 QID 3
[  819.845322] nvme 0000:03:00.0: Cancelling I/O 709 QID 3
[  819.845331] nvme 0000:03:00.0: Cancelling I/O 710 QID 3
[  819.845340] nvme 0000:03:00.0: Cancelling I/O 711 QID 3
[  819.845350] nvme 0000:03:00.0: Cancelling I/O 712 QID 3
[  819.845358] nvme 0000:03:00.0: Cancelling I/O 713 QID 3
[  819.845367] nvme 0000:03:00.0: Cancelling I/O 714 QID 3
[  820.054804] ------------[ cut here ]------------
[  820.054812] WARNING: at /mnt/work/nvidia.com/tx1/Linux_for_Tegra_R28.1/sources/kernel/kernel-4.4/lib/percpu-refcount.c:324
[  820.054815] Modules linked in: nvme spi_xilinx spi_bitbang imx183(O) sdma(O) videobuf2_dma_sg bcmdhd cfg80211 ahci_tegra libahci_platform libahci [last unloaded: nvme]

[  820.054845] CPU: 2 PID: 174 Comm: kworker/2:3 Tainted: G           O    4.4.38+ #4
[  820.054848] Hardware name: jetson_tx1 (DT)
[  820.054870] Workqueue: events nvme_probe_work [nvme]
[  820.054875] task: ffffffc0f3dd0c80 ti: ffffffc0f3e14000 task.ti: ffffffc0f3e14000
[  820.054884] PC is at percpu_ref_reinit+0x30/0x118
[  820.054891] LR is at blk_mq_unfreeze_queue+0x58/0x7c
[  820.054894] pc : [<ffffffc000383ebc>] lr : [<ffffffc0003553cc>] pstate: 00000145
[  820.054896] sp : ffffffc0f3e17cb0
[  820.054899] x29: ffffffc0f3e17cb0 x28: 0000000000000000
[  820.054904] x27: 0000000000000000 x26: 0000000000000000
[  820.054909] x25: 0000000000000000 x24: ffffffc0012cfd84
[  820.054913] x23: ffffffc0ffe6d800 x22: 0000000000000000
[  820.054917] x21: 0000000000000001 x20: ffffffc0d83f4550
[  820.054921] x19: ffffffc0f3daa5b0 x18: 0000000000000a03
[  820.054925] x17: 0000007f865aef18 x16: ffffffc0001d87b8
[  820.054929] x15: 003b9aca00000000 x14: 0ffffffffffffffe
[  820.054932] x13: 0000000000000038 x12: 0000000000000038
[  820.054936] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[  820.054940] x9 : 0000000000000000 x8 : ffffffc07dd64000
[  820.054944] x7 : 0000000000000000 x6 : 000000000000003f
[  820.054947] x5 : 0000000000000040 x4 : 0000000000000000
[  820.054951] x3 : ffffffc0f3daa584 x2 : 0000000000000000
[  820.054954] x1 : 0000000000000000 x0 : 0000000000000000

[  820.054961] ---[ end trace 45cb48c7fe5fcc48 ]---
[  820.054964] Call trace:
[  820.069744] [<ffffffc000383ebc>] percpu_ref_reinit+0x30/0x118
[  820.069749] [<ffffffc0003553cc>] blk_mq_unfreeze_queue+0x58/0x7c
[  820.069766] [<ffffffbffc03b8fc>] nvme_unfreeze_queues+0x3c/0x6c [nvme]
[  820.069779] [<ffffffbffc03e0ac>] nvme_probe_work+0x194/0x280 [nvme]
[  820.069788] [<ffffffc0000bdacc>] process_one_work+0x258/0x434
[  820.069793] [<ffffffc0000be130>] worker_thread+0x160/0x288
[  820.069799] [<ffffffc0000c3f28>] kthread+0x100/0x108
[  820.069806] [<ffffffc000084790>] ret_from_fork+0x10/0x40
[  820.113501] mc-err: (0) csr_afir: EMEM address decode error
[  820.113510] mc-err:   status = 0x2000000e; addr = 0x00000000
[  820.113516] mc-err:   secure: no, access-type: read, SMMU fault: none
[  820.113543] mc-err: (0) csr_afir: EMEM address decode error
[  820.113548] mc-err:   status = 0x2000000e; addr = 0x00000000
[  820.113551] mc-err:   secure: no, access-type: read, SMMU fault: none
[  820.113566] mc-err: (0) csr_afir: EMEM address decode error
[  820.113570] mc-err:   status = 0x2000000e; addr = 0x00000000
[  820.113574] mc-err:   secure: no, access-type: read, SMMU fault: none
[  820.113586] mc-err: (0) csr_afir: EMEM address decode error
[  820.113590] mc-err:   status = 0x2000000e; addr = 0x00000000
[  820.113594] mc-err:   secure: no, access-type: read, SMMU fault: none
[  820.113606] mc-err: Too many MC errors; throttling prints
[  820.820999] nvme 0000:03:00.0: Failed status: 3, reset controller
[  820.821127] nvme 0000:03:00.0: Cancelling I/O 319 QID 4
[  820.821144] nvme 0000:03:00.0: Cancelling I/O 320 QID 4
[  820.821151] nvme 0000:03:00.0: Cancelling I/O 321 QID 4
[  820.821158] nvme 0000:03:00.0: Cancelling I/O 322 QID 4
[  820.821164] nvme 0000:03:00.0: Cancelling I/O 323 QID 4
[  820.821172] nvme 0000:03:00.0: Cancelling I/O 689 QID 3
[  820.821182] nvme 0000:03:00.0: Cancelling I/O 692 QID 3
[  820.821191] nvme 0000:03:00.0: Cancelling I/O 693 QID 3
[  820.821200] nvme 0000:03:00.0: Cancelling I/O 694 QID 3
[  820.821210] nvme 0000:03:00.0: Cancelling I/O 695 QID 3
[  820.821218] nvme 0000:03:00.0: Cancelling I/O 696 QID 3
[  820.821227] nvme 0000:03:00.0: Cancelling I/O 697 QID 3
[  820.821234] nvme 0000:03:00.0: Cancelling I/O 698 QID 3
[  820.821243] nvme 0000:03:00.0: Cancelling I/O 699 QID 3
[  820.821253] nvme 0000:03:00.0: Cancelling I/O 700 QID 3
[  820.821261] nvme 0000:03:00.0: Cancelling I/O 701 QID 3
[  820.821270] nvme 0000:03:00.0: Cancelling I/O 702 QID 3
[  820.821276] nvme 0000:03:00.0: Cancelling I/O 703 QID 3
[  820.821285] nvme 0000:03:00.0: Cancelling I/O 704 QID 3
[  820.821295] nvme 0000:03:00.0: Cancelling I/O 705 QID 3
[  820.821302] nvme 0000:03:00.0: Cancelling I/O 706 QID 3
[  820.821312] nvme 0000:03:00.0: Cancelling I/O 707 QID 3
[  820.821320] nvme 0000:03:00.0: Cancelling I/O 708 QID 3
[  820.821328] nvme 0000:03:00.0: Cancelling I/O 709 QID 3
[  820.821336] nvme 0000:03:00.0: Cancelling I/O 710 QID 3
[  820.821346] nvme 0000:03:00.0: Cancelling I/O 711 QID 3
[  820.821355] nvme 0000:03:00.0: Cancelling I/O 712 QID 3
[  820.821363] nvme 0000:03:00.0: Cancelling I/O 713 QID 3
[  820.821372] nvme 0000:03:00.0: Cancelling I/O 714 QID 3
[  820.821380] nvme 0000:03:00.0: Cancelling I/O 1 QID 0
[  820.821408] nvme 0000:03:00.0: Identify Controller failed (-4)
[  820.821476] nvme 0000:03:00.0: Device failed to resume
[  820.821575] blk_update_request: I/O error, dev nvme0n1, sector 381681704

Crash #2

[  197.795969] nvme 0000:03:00.0: Failed status: 3, reset controller
[  197.796092] nvme 0000:03:00.0: Cancelling I/O 896 QID 4
[  197.796106] nvme 0000:03:00.0: Cancelling I/O 897 QID 4
[  197.796115] nvme 0000:03:00.0: Cancelling I/O 929 QID 4
[  197.796127] nvme 0000:03:00.0: Cancelling I/O 930 QID 4
:
:
[  197.798881] nvme 0000:03:00.0: Cancelling I/O 661 QID 1
[  197.798893] nvme 0000:03:00.0: Cancelling I/O 662 QID 1
[  198.807952] nvme 0000:03:00.0: Failed status: 3, reset controller
[  198.808034] nvme 0000:03:00.0: Cancelling I/O 1 QID 0
[  198.808146] nvme 0000:03:00.0: Device failed to resume
[  198.808252] Unable to handle kernel paging request at virtual address ffffff800001c01c
[  198.808257] pgd = ffffffc0013a1000
[  198.808260] [ffffff800001c01c] *pgd=000000017a1d4003, *pud=000000017a1d4003, *pmd=000000017a1d5003, *pte=0000000000000000
[  198.808271] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[  198.857474] Modules linked in: nvme spi_xilinx spi_bitbang imx183(O) sdma(O) videobuf2_dma_sg bcmdhd cfg80211 ahci_tegra libahci_platform libahci [last unloaded: nvme]
[  199.054273] CPU: 2 PID: 2092 Comm: nvme0 Tainted: G           O    4.4.38+ #4
[  199.122211] Hardware name: jetson_tx1 (DT)
[  199.171161] task: ffffffc0da2e1900 ti: ffffffc08dcd0000 task.ti: ffffffc08dcd0000
[  199.260758] PC is at nvme_dev_remove+0x1c/0x74 [nvme]
[  199.321164] LR is at nvme_remove+0x78/0x10c [nvme]
[  199.378442] pc : [<ffffffbffc004cbc>] lr : [<ffffffbffc004d8c>] pstate: 20000145
[  199.466971] sp : ffffffc08dcd3cb0
[  199.506573] x29: ffffffc08dcd3cb0 x28: 0000000000000000
[  199.582754] x27: 0000000000000000 x26: 0000000000000000
[  199.646340] x25: 0000000000000000 x24: 0000000000000000
[  199.709876] x23: ffffffbffc00392c x22: ffffffc0de096000
[  199.773396] x21: ffffffc0f3d84890 x20: ffffffc0de096000
[  199.836923] x19: ffffffc0de096000 x18: 000000000485b480
[  199.900504] x17: 0000000000000007 x16: 0000000000000007
[  199.964003] x15: 000000000000000e x14: 0000000000000001
[  200.027544] x13: ffffffffffffff00 x12: 0000000000000000
[  200.091071] x11: 0000000000000000 x10: 0101010101010101
[  200.154601] x9 : 1f1f1f1f644c554d x8 : 7f7f7f7f7f7f7f7f
[  200.218211] x7 : fefefefefefefeff x6 : 80f0e3e9e5ade3ef
[  200.281672] x5 : ffffffc00022ef60 x4 : 0000000000000001
[  200.345224] x3 : ffffffc0d9dd22d0 x2 : 0000000000000000
[  200.408769] x1 : 0000000000000001 x0 : ffffff800001c01c

[  200.487041] Process nvme0 (pid: 2092, stack limit = 0xffffffc08dcd0020)
[  200.556456] Call trace:
[  200.585626] [<ffffffbffc004cbc>] nvme_dev_remove+0x1c/0x74 [nvme]
[  200.658532] [<ffffffbffc004d8c>] nvme_remove+0x78/0x10c [nvme]
[  200.728364] [<ffffffc0003d0c18>] pci_device_remove+0x40/0x108
[  200.797052] [<ffffffc0005f2078>] __device_release_driver+0x80/0xd8
[  200.871012] [<ffffffc0005f20f4>] device_release_driver+0x24/0x38
[  200.942878] [<ffffffc0003c9a20>] pci_stop_dev+0x38/0x50
[  201.005369] [<ffffffc0003c9a88>] pci_stop_bus_device+0x50/0x60
[  201.075155] [<ffffffc0003c9c50>] pci_stop_and_remove_bus_device+0x14/0x28
[  201.156398] [<ffffffc0003c9cf4>] pci_stop_and_remove_bus_device_locked+0x1c/0x2c
[  201.244937] [<ffffffbffc003950>] nvme_remove_dead_ctrl+0x24/0x58 [nvme]
[  201.324180] [<ffffffc0000c3f28>] kthread+0x100/0x108
[  201.383459] [<ffffffc000084790>] ret_from_fork+0x10/0x40
[  201.447002] ---[ end trace 095acc50607f6f3c ]---

Hi danieel,

What module are we targeting now? In the beginning, you were on TX2 and now TX1.

If I understand correct, following is the current status. Please confirm.

TX1(k3.10) -> good
TX1(k4.4) -> fail
TX2(K4.4) -> fail.

Hi WayneWWW, we have built it around TX2 since we need more GPU power to do the ISP functionality, and the compression would run also faster. Both are utilizing CUDA. Further plan is to utilize at least the HW encoder for video previews, and TX2 is better in that. Since all this stuff is memory related, the TX2 is much better choice.

We have still access to 4 TX1 units in the previous version, so that we can compare the behavior and in worst case it could be used as a fallback. But with the 3.10 kernel it is ancient and hard to support, we had to backport/rewrite lot of stuff around V4L2.

TX1 3.10 = stable, has no PCIe in iommu/masters *
TX1 4.4 = no/no (+smmu: fails on itself, -smmu: nvme reset, mc error)
TX2 4.4 = no/no (+smmu: iova exceptions, -smmu: nvme reset)

The issue is not in NVMe driver, since an older M2 ssd with AHCI driver fails as well.
The issue is not in our PCIe switch hardware, since I just pull out the SD card with 4.4, reboot from internal 3.10 and it works correctly on the same platform.

We see completion timeouts on our FPGA board - that is, reads from system memory are no longer answered by the Tegra host. This might be the primary cause, why all pcie devices (camera and ssd) stop working. The time to crash is random, between 2 seconds and 6 minutes, and if I run a couple of dd commands it could be taken down in about 1-2 minutes.

Are there any nVidia facilities in Europe equipped with PCIe bus analyzers in order to confirm that there is an actual pause in pcie operation?

The 10GE NIC was put on hold and removed from the testing systems to ease debugging.

Is ASPM enabled here by any chance?
Can you please give the output of ‘sudo lspci -vv’ ?
Also, please try with ‘echo “performance” > /sys/module/pcie_aspm/policy/parameters’ once.

This does not help. It still crashes and I set it to performance correctly:

$ sudo cat /sys/module/pcie_aspm/parameters/policy

default [performance] powersave

$ sudo lspci -vv # from TX2 R28.1 kernel 4.4.x

00:01.0 PCI bridge: NVIDIA Corporation Device 10e5 (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 390
        Bus: primary=00, secondary=01, subordinate=08, sec-latency=0
        I/O behind bridge: 00001000-00002fff
        Memory behind bridge: 50100000-507fffff
        Prefetchable memory behind bridge: 0000000058000000-00000000583fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
                Mapping Address Base: 00000000fee00000
        Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Off, PwrInd On, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Kernel driver in use: pcieport

01:00.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 450
        Region 0: Memory at 50700000 (32-bit, non-prefetchable) 
        Bus: primary=01, secondary=02, subordinate=08, sec-latency=0
        I/O behind bridge: 00001000-00002fff
        Memory behind bridge: 50100000-506fffff
        Prefetchable memory behind bridge: 0000000058000000-00000000583fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/4 Maskable+ 64bit+
                Address: 00000000d931f000  Data: 0000
                Masking: 0000000f  Pending: 00000000
        Capabilities: [68] Express (v2) Upstream Port, MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #1, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM L1 Enabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane]
        Capabilities: [100 v1] Device Serial Number aa-86-00-10-b5-df-0e-00
        Capabilities: [fb4 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [138 v1] Power Budgeting <?>
        Capabilities: [148 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
        Kernel driver in use: pcieport

02:00.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 451
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
        Memory behind bridge: 50100000-501fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/4 Maskable+ 64bit+
                Address: 00000000d931f000  Data: 0001
                Masking: 0000000f  Pending: 00000000
        Capabilities: [68] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM L1 Enabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk- DLActive+ BWMgmt+ ABWMgmt+
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 25.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState+
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane]
        Capabilities: [100 v1] Device Serial Number aa-86-00-10-b5-df-0e-00
        Capabilities: [fb4 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=4
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=06 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32+ WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=WRR32 TC/VC=ff
                        Status: NegoPending- InProgress-
                        Port Arbitration Table <?>
        Capabilities: [448 v1] Vendor Specific Information: ID=0000 Rev=0 Len=0cc <?>
        Capabilities: [520 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
        Kernel driver in use: pcieport

02:04.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 452
        Bus: primary=02, secondary=04, subordinate=04, sec-latency=0
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/4 Maskable+ 64bit+
                Address: 00000000d931f000  Data: 0002
                Masking: 0000000f  Pending: 00000000
        Capabilities: [68] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #4, Speed 5GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #4, PowerLimit 25.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
                        Changed: MRL- PresDet- LinkState-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane]
        Capabilities: [100 v1] Device Serial Number aa-86-00-10-b5-df-0e-00
        Capabilities: [fb4 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending+ InProgress-
        Capabilities: [520 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
        Kernel driver in use: pcieport

02:05.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 453
        Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
        I/O behind bridge: 00001000-00001fff
        Memory behind bridge: 50200000-503fffff
        Prefetchable memory behind bridge: 0000000058000000-00000000581fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/4 Maskable+ 64bit+
                Address: 00000000d931f000  Data: 0003
                Masking: 0000000f  Pending: 00000000
        Capabilities: [68] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #5, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise-
                        Slot #5, PowerLimit 25.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd On, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
                        Changed: MRL+ PresDet- LinkState-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane]
        Capabilities: [100 v1] Device Serial Number aa-86-00-10-b5-df-0e-00
        Capabilities: [fb4 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending+ InProgress-
        Capabilities: [520 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
        Kernel driver in use: pcieport

02:06.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 454
        Bus: primary=02, secondary=06, subordinate=06, sec-latency=0
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/4 Maskable+ 64bit+
                Address: 00000000d931f000  Data: 0004
                Masking: 0000000f  Pending: 00000000
        Capabilities: [68] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #6, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #6, PowerLimit 25.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
                        Changed: MRL- PresDet- LinkState-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane]
        Capabilities: [100 v1] Device Serial Number aa-86-00-10-b5-df-0e-00
        Capabilities: [fb4 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending+ InProgress-
        Capabilities: [520 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
        Kernel driver in use: pcieport

02:08.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 455
        Bus: primary=02, secondary=07, subordinate=07, sec-latency=0
        Memory behind bridge: 50400000-504fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/4 Maskable+ 64bit+
                Address: 00000000d931f000  Data: 0005
                Masking: 0000000f  Pending: 00000000
        Capabilities: [68] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #8, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk- DLActive+ BWMgmt+ ABWMgmt+
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #8, PowerLimit 25.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState+
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane]
        Capabilities: [100 v1] Device Serial Number aa-86-00-10-b5-df-0e-00
        Capabilities: [fb4 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [520 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
        Kernel driver in use: pcieport

02:09.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 456
        Bus: primary=02, secondary=08, subordinate=08, sec-latency=0
        I/O behind bridge: 00002000-00002fff
        Memory behind bridge: 50500000-506fffff
        Prefetchable memory behind bridge: 0000000058200000-00000000583fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/4 Maskable+ 64bit+
                Address: 00000000d931f000  Data: 0006
                Masking: 0000000f  Pending: 00000000
        Capabilities: [68] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #9, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise-
                        Slot #9, PowerLimit 25.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd On, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL+ PresDet+ LinkState-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane]
        Capabilities: [100 v1] Device Serial Number aa-86-00-10-b5-df-0e-00
        Capabilities: [fb4 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending+ InProgress-
        Capabilities: [520 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
        Kernel driver in use: pcieport

03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a804 (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd Device a801
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 390
        Region 0: Memory at 50100000 (64-bit, non-prefetchable) 
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
                Vector table: BAR=0 offset=00003000
                PBA: BAR=0 offset=00002000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158 v1] Power Budgeting <?>
        Capabilities: [168 v1] #19
        Capabilities: [188 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [190 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
        Kernel driver in use: nvme
        Kernel modules: nvme

07:00.0 Memory controller: Xilinx Corporation Device 7024
        Subsystem: Xilinx Corporation Device 0007
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin ? routed to IRQ 464
        Region 0: Memory at 50400000 (64-bit, non-prefetchable) 
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000d931f000  Data: 000d
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis-, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: ********-v4l2