While communicating with a custom PCIe device that we develop we face couple of issues.
The device was working correctly in the earlier version of L4T.
The device basically bus masters data transfer from and to the L4T system memory.
The system memory is allocated using normal malloc in user space and then mapped using pci_map_sg.
Issue#1. AER (PCIe advanced error reporting utility) now reports link layer error.
Issue#2. We get Unhandled context fault from IOMMU
Version of L4T with issue
NVIDIA Jetson TX2
L4T 32.2.1 [ JetPack UNKNOWN ]
Board: t186ref
Ubuntu 18.04.2 LTS
Kernel Version: 4.9.140-tegra
relevant part of dmesg log that shows AER messages and also context fault.
[ +0.006369] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0020
[ +0.000107] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0008(Transmitter ID)
[ +0.010532] pcieport 0000:00:01.0: device [10de:10e5] error status/mask=00001100/00002000
[ +0.008379] pcieport 0000:00:01.0: [ 8] RELAY_NUM Rollover
[ +0.006115] pcieport 0000:00:01.0: [12] Replay Timer Timeout
[ +0.006207] camera_ipu 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0100(Transmitter ID)
[ +0.011257] camera_ipu 0000:01:00.0: device [1e53:9024] error status/mask=00001100/0000e000
[ +0.008672] camera_ipu 0000:01:00.0: [ 8] RELAY_NUM Rollover
[ +0.006605] camera_ipu 0000:01:00.0: [12] Replay Timer Timeout
[ +0.006368] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0020
[ +0.000203] camera_ipu 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0100(Transmitter ID)
[ +0.010806] camera_ipu 0000:01:00.0: device [1e53:9024] error status/mask=00001100/0000e000
[ +0.008589] camera_ipu 0000:01:00.0: [ 8] RELAY_NUM Rollover
[ +0.006333] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x17ffffc00, fsynr=0x200013, cb=21, sid=17(0x11 - AFI), pgd=22ea9e003, pud=22ea9e003, pmd=23be24003, pte=0
[ +0.000147] mc-err: vpr base=0:0, size=0, ctrl=1, override:(e01a8341, 1dc10c1, 2a800000, 2)
[ +0.000010] mc-err: (255) csw_afiw: MC request violates VPR requirements
[ +0.000006] mc-err: status = 0x00337031; addr = 0x3ffffffc0
[ +0.000005] mc-err: secure: yes, access-type: write
[ +0.000013] mc-err: unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[ +0.000011] mc-err: unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[ +0.000013] mc-err: unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[ +0.000043] mc-err: Too many MC errors; throttling prints
The pcie driver was working correctly in the following l4t version
NVIDIA Jetson TX2
L4T 28.2.1 [ JetPack 3.3 or 3.2.1 ]
Board: t186ref
Ubuntu 16.04 LTS
Kernel Version: 4.4.38
since bus_to_virt is not supported in the 4.9 kernel we have modified the driver as follows.
This is the only driver change. Could this be the reason for Unhandled context fault?
retval = remap_pfn_range(vma, mmap_start,
-
PFN_DOWN(virt_to_phys(bus_to_virt(
-
mem_tmp->buf_list.pa_buffers[i].paddr))) +
-
PFN_DOWN(
-
mem_tmp->buf_list.pa_buffers[i].paddr) + mmap_pgoff, mem_tmp->buf_list.pa_buffers[i].bytes, vma->vm_page_prot);
Can you please check what we are possibly doing wrong?