Internal error: Accessing user space memory outside uaccess.h routines (Solved)

We’re trying to get a custom kernel module that interfaces with a PCIe FPGA card using a large reserved memory block working on the Xavier (JetPack 4.1, L4T 31.0.2). The same kernel module currently works on a TX2 (JetPack 3.2, L4T R28.2).

With this addition to the device tree:

reserved-memory {
    fpga-carveout {
        reg = <0x0 0xC0000000 0x0 0x40000000>;
    };
};

any attempt to read from or write to the reserved memory block produces this error:

[   54.264565] Internal error: Accessing user space memory outside uaccess.h routines: 9600000f [#1] PREEMPT SMP

We’ve previously used this device tree patch to successfully run the same code on the TX2: https://devtalk.nvidia.com/default/topic/1020491/jetson-tx2/iommu-unhandled-context-fault-on-reserved-memory/

What’s required to get an equivalent working configuration for the Xavier?

Please try with the following patch after cd’ing hardware/nvidia/soc/t19x/ to this directory

diff --git a/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi b/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
index 3a9ee0d..43d1e22 100644
--- a/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
+++ b/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
@@ -497,9 +497,6 @@
                                         <0 73 0x04>;   /* MSI interrupt */
                interrupt-names = "intr", "msi";

-               iommus = <&smmu TEGRA_SID_PCIE0>;
-               dma-coherent;
-
                #interrupt-cells = <1>;
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &intc 0 72 0x04>;
@@ -555,9 +552,6 @@
                                         <0 46 0x04>;   /* MSI interrupt */
                interrupt-names = "intr", "msi";

-               iommus = <&smmu TEGRA_SID_PCIE1>;
-               dma-coherent;
-
                #interrupt-cells = <1>;
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &intc 0 45 0x04>;
@@ -614,9 +608,6 @@
                                         <0 48 0x04>;   /* MSI interrupt */
                interrupt-names = "intr", "msi";

-               iommus = <&smmu TEGRA_SID_PCIE2>;
-               dma-coherent;
-
                #interrupt-cells = <1>;
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &intc 0 47 0x04>;
@@ -672,9 +663,6 @@
                                         <0 50 0x04>;   /* MSI interrupt */
                interrupt-names = "intr", "msi";

-               iommus = <&smmu TEGRA_SID_PCIE3>;
-               dma-coherent;
-
                #interrupt-cells = <1>;
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &intc 0 49 0x04>;
@@ -730,9 +718,6 @@
                                         <0 52 0x04>;   /* MSI interrupt */
                interrupt-names = "intr", "msi";

-               iommus = <&smmu TEGRA_SID_PCIE4>;
-               dma-coherent;
-
                #interrupt-cells = <1>;
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &intc 0 51 0x04>;
@@ -792,9 +777,6 @@
                pinctrl-0 = <&pex_rst_c5_out_state>;
                pinctrl-1 = <&clkreq_c5_bi_dir_state>;

-               iommus = <&smmu TEGRA_SID_PCIE5>;
-               dma-coherent;
-
                #interrupt-cells = <1>;
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &intc 0 53 0x04>;

Unfortunately, we get the exact same error after applying the patch.

Patch posted in comment #2 in this thread is equivalent (of Jetson-Xavier) to the patch posted in comment #2 of the other thread. So, if the intention is to let PCIe IP access memory (reserved through device tree) bypassing SMMU, then this patch is still required.
Now, I’m wondering if the memory is really getting reserved? How did you confirm this?
Also, does this error occur when PCIe IP tries to access this reserved memory region? OR when CPU (in kernel context) tries to access the memory?
Looking at the error, it seems some kernel component is trying to access user space memory.

Thank you for the patch. We’ve confirmed that it works now. Details below.

This is the best we have for confirming the reservation:

Before reservation:

dmesg | grep 'Memory: '
[    0.000000] Memory: 16124884K/16523264K available (15358K kernel code, 2942K rwdata, 6716K rodata, 8512K init, 1623K bss, 332844K reserved, 65536K cma-reserved)

After reservation:

[    0.000000] Memory: 15326172K/16523264K available (15358K kernel code, 2942K rwdata, 6716K rodata, 8512K init, 1623K bss, 1131556K reserved, 65536K cma-reserved)

That’s only 779MB of additional reserved memory. While the TX2 emitted a warning at boot about overlapping reservations with this same block, there is no such warning on the Xavier. We’re not using the entire block yet, and we’ll deal with resizing or relocating (and eventually eliminating) it in the future.

Also, querying the running device tree confirms that the reserved block is present:

dtc -I fs -O dts -o - /proc/device-tree 2> /dev/null | grep -B25 -A3 fpga-carveout

    reserved-memory {
        ranges;
        #address-cells = <0x2>;
        #size-cells = <0x2>;

        ramoops_carveout {
            compatible = "nvidia,ramoops";
            alignment = <0x0 0x10000>;
            alloc-ranges = <0x0 0x0 0x1 0x0>;
            status = "okay";
            no-map;
            size = <0x0 0x200000>;
            phandle = <0x1b7>;
            linux,phandle = <0x1b7>;
        };

        vpr-carveout {
            reusable;
            alignment = <0x0 0x400000>;
            alloc-ranges = <0x0 0x80000000 0x0 0x70000000>;
            phandle = <0x1b8>;
            linux,phandle = <0x1b8>;
        };

        fpga-carveout {
            reg = <0x0 0xc0000000 0x0 0x40000000>;
        };

Thanks for the hint. This was definitely caused by some bad kernel module code that was trying to access user space memory directly. We were apparently getting away with it on both the TX2 and on a number of x86 systems, but not on the Xavier.

Fixing that bad kernel module code eliminates the error and makes the PCIe transfers through the reserved memory block work correctly.

Thanks again for the help.

Hi,

This is most probably caused by the new ARM64 version being used in Xavier compared to TX2s, arm64 v8.2. TX2 has arm64 v8.

Since arm64 v8.1 a new cpufeature has been introduced called PAN (Privileged Access Never) for which kernel code has been updated to throw permission error when kernel tries to access userspace memory. I think this is the likely reason for the failures we are seeing.

We fixed it by doing copy_from_user() on all pointers potentially pointing to possible strings.