Activate configuration 6 (UPHY lane assignments) on custom TX2i board

Hello,

I have a custom board that has an Ethernet bridge connected to the PEX2 lane.
I have other devices which will be connected to the other lanes (expansion boards), fitting the configuration number 6 in the “UPHY Lane Assignments”.
However, I am unable to force this configuration, all my attempts end up with:
[ 0.943015] tegra-pcie 10003000.pcie-controller: wrong configuration updated in DT, switching to default 2x1, 1x1, 1x1 configuration
[ 0.943961] tegra-pcie 10003000.pcie-controller: PCIE: Enable power rails
[ 0.944291] tegra-pcie 10003000.pcie-controller: probing port 0, using 2 lanes
[ 0.946669] tegra-pcie 10003000.pcie-controller: probing port 1, using 1 lanes
[ 1.370390] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[ 1.773408] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[ 2.177012] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[ 2.179025] tegra-pcie 10003000.pcie-controller: link 0 down, ignoring
[ 2.580594] tegra-pcie 10003000.pcie-controller: link 1 down, retrying
[ 2.984127] tegra-pcie 10003000.pcie-controller: link 1 down, retrying
[ 3.387658] tegra-pcie 10003000.pcie-controller: link 1 down, retrying
[ 3.389670] tegra-pcie 10003000.pcie-controller: link 1 down, ignoring
[ 3.594665] tegra-pcie 10003000.pcie-controller: PCIE: no end points detected
[ 3.594929] tegra-pcie 10003000.pcie-controller: PCIE: Disable power rails

As suggested here I have created my configuration files (derived from the different p3489-1000-a00-00 files), and I have set in my custom .conf.common file the ODMDATA value I think is appropriate (0x3090000), but to no avail.
I’ve tried reading with devmem2 but, besides being forced to round down the addresses to 0 to avoid bus errors (I guess this is due to memory alignment), I read each time the value 0x4D.
Can anyone help me out, please?
Best regards,
Rob

Hi,

Some points to clarify

  1. If you are using rel-32 based jetpack, then every reference page < 2019 cannot help you because rel-28 based sw configuration is different from rel-32.
    Also, there are lots of unnecessary steps in the page you posted…

  2. [ 0.943015] tegra-pcie 10003000.pcie-controller: wrong configuration updated in DT, switching to default 2x1, 1x1, 1x1 configuration

This error means you just put a wrong value for pcie lane in DT. Please note that this cannot be arbitrary number.

Please refer to adaptation guide v1.9 published on 2019/07/02 if you are using rel-32 based release.

Dear WayneWWW,

Thanks for your quick reply!
I’ve naively assumed that putting:

pcie-controller@10003000 {
		status = "okay";
        pci@1,0 {
                nvidia,num-lanes = <2>;
                nvidia,disable-clock-request;
			            status = "okay";
        };
        pci@2,0 {
                nvidia,num-lanes = <1>;
                status = "okay";
        };
        pci@3,0 {
                nvidia,num-lanes = <0>;
                status = "disabled";
        };
};

in the tegra186-quill-p3489-1000-a00-00-base.dts file would have done the job, but even after commenting out the fragment in tegra186-quill-p3489-1000-a00-plugin-manager.dtsi that seems to overwrite it (fragment-500-e3325-pcie), this is not the case.

Today I’ve started from scratch but still no progress at all.
The adaptation guide, version 13.07.2020, seems not to help me that much (the USB configuration is quite detailed, but I see very little mention of the PCIe configuration – is it because it is the USB configuration which is interfering with the PCIe one?)
In particular, I have the following issues:

  1. I cannot get the recovery USB to work – even though it is wired exactly as in the dev kit. I can flash through it, but I cannot get it as a serial interface (I am currently connected via the debug UART)
  2. I’ve tried following the other forum posts (>=2019), but PCIe keeps giving me the same error in Linux and defaulting to 2x+1x+1x
  3. At each boot I get a set of
    mmc1: CMD CRC or end bit error, int mask 0x40000
    or similar, but this does not happen when the module is plugged in the dev kit

Overall, both (1) and (3) are minor to me, as my priority now is to switch the module to the configuration #6 (I also have a video in which I suppose will cause me a lot of troubles, but I don’t want to think about it yet).
Could you please point me in the right direction for the three problems above?
Thanks a lot in advance!
Rob

Hi,

I have to say something here

  1. Recovery USB not working is a serious problem. Also, it is a hardware design problem but not software problem. We have heard lots of users said “my design is same as devkit”, but actually they don’t. Thus, please try to get this issue high priority.

  2. Actually, your assumption is not correct.
    There are only 2 configs that can work for pcie in dtb.

→ 4,0,1 and 2,1,1

Thus, when you want to use config 6, what you have to do is write down 2,1,1 and disable the status in lane 0.
Then, set the correct ODMDATA.

Actually, I don’t know how you set ODMDATA if your recovery usb cannot work.

The remaining check point is the usb. You have to make sure the usbs are correctly disabled.
We write lots of detail for usb in adaptation guide because there are indeed not very much changes needed in pcie…

Hi WayneWWW,

  1. For some unknown reason the recovery USB works fine for flashing the board, it’s the console on the recovery USB that does not start up :( I do guess that is something related to the fact that the board does not set itself in host mode but stays as a device, but I cannot understand why since this works if I just plug the module back in the dev kit.
    I have checked multiple times and the wiring of the recovery USB is easy, it just comes from the connector, passes through a small ESD protection chip, and then goes in the Jetson… But I’ll check later with an hw engineer.
    I set the ODMDATA to 0x3090000 in my custom board.conf.common file, under the “3489” branch of the “if” in the file.
  2. Very interesting indeed! However I have to take up the USBs, since I use two of them (I have an LTE modem and an expansion port that have to be connected to USB 3). I’ll post the details of the config files as soon as I get to my working place.

Thanks again for your help and have a nice day!
Rob

And actually it is common to share the full dmesg instead of partial one on forum…

Hello WayneWWW,

Few words on our custom board first :
we started from the devkit design, dropped what we deemed unnecessary (e.g., the EEPROM with version number), and then added:

  1. an Ethernet bridge on a PCIe interface
  2. an LTE module on a USB3 interface (with an M.2 connector)
  3. exported a PCIe x2 and a USB3 on two M.2 connectors
  4. added an ADV7282A-M IC as analog->MIPI CSI converter (we have to connect an analog camera to the board)

I then did the following steps:

a) Prepared sources and did a first kernel compilation to ensure things were okay

# Unpack archives and get sources
export L4T_RELEASE_PACKAGE=Tegra186_Linux_R32.4.3_aarch64.tbz2
export SAMPLE_FS_PACKAGE=Tegra_Linux_Sample-Root-Filesystem_R32.4.3_aarch64.tbz2
export BOARD=jetson-tx2i

tar xvf ${L4T_RELEASE_PACKAGE}
cd Linux_for_Tegra/rootfs/
sudo tar xvpf ../../${SAMPLE_FS_PACKAGE}

cd ..
sudo ./apply_binaries.sh

./source_sync.sh -t tegra-l4t-r32.4.3

# Compile kernel
export TEGRA_KERNEL_OUT=build
export CROSS_COMPILE=aarch64-linux-gnu-
export KERNEL_SOURCE_DIR=sources/kernel/kernel-4.9
export LOCALVERSION=-tegra
cd sources/kernel/kernel-4.9/
mkdir -p $TEGRA_KERNEL_OUT
make ARCH=arm64 O=$TEGRA_KERNEL_OUT tegra_defconfig
make ARCH=arm64 O=$TEGRA_KERNEL_OUT -j16
# Backup old kernel (if any)
mv ../../../kernel/Image ../../../kernel/Image.old
cp $TEGRA_KERNEL_OUT/arch/arm64/boot/Image ../../../kernel/Image
# Backup old DTBs (if any)
rm -rf ../../../kernel/dtb.old
mv ../../../kernel/dtb ../../../kernel/dtb.old
cp $TEGRA_KERNEL_OUT/arch/arm64/boot/dts ../../../kernel/dtb -R
# Install kernel modules
sudo make ARCH=arm64 O=$TEGRA_KERNEL_OUT modules_install \
     INSTALL_MOD_PATH=../../../Linux_for_Tegra/rootfs/

b) Filled the spreadsheet as attached, obtained the DTSI files, and applied them using

cd Linux_for_Tegra/kernel/pinmux/t186/
python pinmux-dts2cfg.py --pinmux addr_info.txt gpio_addr_info.txt por_val.txt --mandatory_pinmux_file mandatory_pinmux.txt \
       tegra18x-jetson-tx2-default-pinmux.dtsi \
       tegra18x-jetson-tx2-default-gpio-default.dtsi 1.0 \
       > ../../../bootloader/t186ref/BCT/tegra186-mb1-bct-pinmux-quill-p3489-1000-a00.cfg

c) Modified rootfs for automatic user creation (headless setup)

# Prevent OEM setup
cat << EOF | sudo tee -a rootfs/etc/systemd/system/default.target
[Unit]
Requires=multi-user.target
Wants=display-manager.service
EOF

# Generate normal user in rootfs
sudo cp /usr/bin/qemu-aarch64-static rootfs/usr/bin/
sudo chroot rootfs qemu-aarch64-static /bin/bash
adduser rob

# !! Enter user details !!

# Make the user sudo
adduser rob sudo
sed -i 's|^%sudo.*|%sudo\tALL=NOPASSWD: ALL|' /etc/sudoers

exit
sudo rm rootfs/usr/bin/qemu-aarch64-static

d) Modified the DTS files as follows

  • sources/hardware/nvidia/platform/t18x/common/kernel-dts/t18x-common-platforms/mods-display.dtsi:
    disabled all the nvdisplay and sor nodes (no HDMI in our board)
  • sources/hardware/nvidia/platform/t18x/common/kernel-dts/t18x-common-platforms/tegra186-hdmi.dtsi:
    disabled all the sor* and hdmi-display nodes
  • sources/hardware/nvidia/platform/t18x/common/kernel-dts/t18x-common-platforms/tegra186-quill-common.dtsi:
    disabled all the nvdisplay and sor nodes
  • sources/hardware/nvidia/platform/t18x/common/kernel-dts/t18x-common-platforms/tegra186-quill-power-tree-p3489-1000-a00-00.dtsi:
    moved the hdmi power source to battery_reg
  • sources/hardware/nvidia/platform/t18x/common/kernel-dts/t18x-common-plugin-manager/tegra186-odm-data-plugin-manager.dtsi:
    disabled all the nvdisplay and sor nodes, commented fragment “fragement@10” (*)
  • sources/hardware/nvidia/platform/t18x/common/kernel-dts/t18x-common-plugin-manager/tegra186-quill-display-plugin-manager.dtsi:
    disabled all the nvdisplay nodes
  • sources/hardware/nvidia/platform/t18x/common/kernel-dts/t18x-common-plugin-manager/tegra186-quill-p3489-1000-a00-plugin-manager.dtsi:
    disabled all the nvdisplay and sor nodes
  • sources/hardware/nvidia/platform/t18x/quill/kernel-dts/tegra186-quill-p3489-1000-a00-00-base.dts:
    • commented “board-has-eeprom” (to avoid all the error messages due to failed reads)
    • added a “usb3-1” node in “xusb_padctl@3520000”
    • in “pinctrl@3520000” renamed the usb3-0 node to “usb3-std-A-port1”, then added a “usb3-std-A-port2” node for usb3-1
    • set the “pcie-controller@10003000” for the 2-1-1 configuration
    • disabled the “sor1” node
    • added usb3-1 in “xhci@3530000”

d) Modified the p2771.conf.common board config file at line 123, changing ODMDATA from 0x1090000 to 0x3090000 (when it says “elif [ “${bid}” = “3489” ]; then”)

e) Recompiled the DTS, flashed them on the board with

export LOCALVERSION=-tegra
export TEGRA_KERNEL_OUT=build
export CROSS_COMPILE=aarch64-linux-gnu-
export KERNEL_SOURCE_DIR=sources/kernel/kernel-4.9
cd $KERNEL_SOURCE_DIR
make ARCH=arm64 O=$TEGRA_KERNEL_OUT dtbs
rm -rf ../../../kernel/dtb.old
mv ../../../kernel/dtb ../../../kernel/dtb.old
cp $TEGRA_KERNEL_OUT/arch/arm64/boot/dts ../../../kernel/dtb -R
cd ~/Linux_for_Tegra/
sudo ./flash.sh -r -k kernel-dtb jetson-tx2i mmcblk0p1

Please find the files mentioned above, as well as the logs, here. Should you need any other dump or file, please do not hesitate to ask.

Despite having set the configuration of the PCIe to 2-1-1 (with the latter disabled), the system still complains about it:

[    0.458243] iommu: Adding device 10003000.pcie-controller to group 49
[    0.458257] arm-smmu: forcing sodev map for 10003000.pcie-controller
[    0.935228] tegra-pcie 10003000.pcie-controller: wrong configuration updated in DT, switching to default 2x1, 1x1, 1x1 configuration
[    0.936196] tegra-pcie 10003000.pcie-controller: PCIE: Enable power rails
[    0.936516] tegra-pcie 10003000.pcie-controller: probing port 0, using 2 lanes
[    0.938806] tegra-pcie 10003000.pcie-controller: probing port 1, using 1 lanes
[    1.373998] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[    1.776440] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[    2.178432] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[    2.180443] tegra-pcie 10003000.pcie-controller: link 0 down, ignoring
[    2.580439] tegra-pcie 10003000.pcie-controller: link 1 down, retrying
[    2.982431] tegra-pcie 10003000.pcie-controller: link 1 down, retrying
[    3.384438] tegra-pcie 10003000.pcie-controller: link 1 down, retrying
[    3.386433] tegra-pcie 10003000.pcie-controller: link 1 down, ignoring
[    3.590802] tegra-pcie 10003000.pcie-controller: PCIE: no end points detected
[    3.591060] tegra-pcie 10003000.pcie-controller: PCIE: Disable power rails

Do you spot any mistake in the steps outlined above?

Thanks a lot in advance and have a nice day!
Rob

(*) is it normal that in the tegra186-odm-data-plugin-manager.dtsi file, fragments are mispelled as “fragements”? Doesn’t this prevent them from being correctly loaded?

Hi,

I don’t think it is a good idea to share how and what file you’ve moved in devcie tree…

The better way is just share us the dts file that converted from dtb and also the full dmesg.
Not partial one but a full one. I don’t know why you keep sending me the pcie error part which you already pasted…

Also, this command does not flash the odmdata. You have to do the full flash.

sudo ./flash.sh -r -k kernel-dtb jetson-tx2i mmcblk0p1

Hello,

I had shared them in a shared directory (could not attach them due to restrictions to uploaded files in the forum).
You can find them here: https://drive.switch.ch/index.php/s/r5l6J3QtP6s2XUt
(the link was in the post above, but possibly lost in the hundreds of lines, sorry).
Thanks again,
Rob

I have done that in the meantime – actually, I’ve flashed the board a dozen of times since reconfiguring the system… Sorry, I should have been more clear about that…

Hi,

Could you firstly disable all the usb here and just verify the function of pcie?

Also, honestly speaking, what I need to read is the finalized dts converted from dtb. But not all the dtsi files here.

To check odmdata, please check /proc/device-tree/chosen/plugin-manager/

Sure, please find the decompiled DTB file in attachment. In the meantime, I’m going to disable the USB and try again.
decompiled_dtb.txt (481.8 KB)

You still write the wrong lane number here.

	pci@3,0 {
		device_type = "pci";
		assigned-addresses = <0x82001800 0x0 0x10004000 0x0 0x1000>;
		reg = <0x1800 0x0 0x0 0x0 0x0>;
		status = "disabled";
		#address-cells = <0x3>;
		#size-cells = <0x2>;
		ranges;
		nvidia,num-lanes = <0x0>;
		nvidia,afi-ctl-offset = <0x19c>;
		nvidia,disable-aspm-states = <0xf>;
	};
/proc/device-tree/chosen/plugin-manager/odm-data$ ll
total 0
drwxr-xr-x 2 root root 0 Sèp 16 03:18 ./
drwxr-xr-x 5 root root 0 Sèp 16 03:18 ../
-r--r--r-- 1 root root 4 Sèp 16 03:18 android-build
-r--r--r-- 1 root root 4 Sèp 16 03:18 disable-pmic-wdt
-r--r--r-- 1 root root 4 Sèp 16 03:18 disable-sdmmc-hwcq
-r--r--r-- 1 root root 4 Sèp 16 03:18 disable-tegra-wdt
-r--r--r-- 1 root root 4 Sèp 16 03:18 enable-debug-console
-r--r--r-- 1 root root 4 Sèp 16 03:18 enable-denver-wdt
-r--r--r-- 1 root root 4 Sèp 16 03:18 enable-pcie-on-uphy-lane2
-r--r--r-- 1 root root 4 Sèp 16 03:18 enable-pcie-on-uphy-lane4
-r--r--r-- 1 root root 4 Sèp 16 03:18 enable-sata-on-uphy-lane5
-r--r--r-- 1 root root 4 Sèp 16 03:18 enable-xusb-on-uphy-lane0
-r--r--r-- 1 root root 4 Sèp 16 03:18 enable-xusb-on-uphy-lane1
-r--r--r-- 1 root root 9 Sèp 16 03:18 name
-r--r--r-- 1 root root 4 Sèp 16 03:18 no-battery
-r--r--r-- 1 root root 4 Sèp 16 03:18 normal-flashed

Ok, then somewhere in the dts files someone overwrites the value I gave it… I’ll hunt down for it and keep you updated, thanks!

Hi,

pinctrl@3520000 is not in use on rel-32.4.3 anymor. If you really read the document carefully, you should know it.

Also, you enable 3 usb ports in your dts.

  usb3 {
			lanes {

				usb3-0 {
					status = "okay";
					#phy-cells = <0x0>;
					nvidia,function = "xusb";
					linux,phandle = <0xad>;
					phandle = <0xad>;
				};

				usb3-1 {
					status = "okay";
					#phy-cells = <0x0>;
					nvidia,function = "xusb";
					linux,phandle = <0xae>;
					phandle = <0xae>;
				};

				usb3-2 {
					status = "okay";
					#phy-cells = <0x0>;
					nvidia,function = "xusb";

Please disable unnecessary ports here.

Dear WayneWWW,

I’ve followed your advices (disabled USB ports, set the proper PCIe config) and indeed the error in dmesg disappeared. The configuration is now 2-1-1 with the last one disabled.

The values I get with devmem2 seem to confirm that the ODMDATA value is correct

Value at address 0x2520284 (0x7fb772b284): 0x0
Value at address 0x2530284 (0x7f8d876284): 0x0
Value at address 0x2540284 (0x7f9020f284): 0x1
Value at address 0x2550284 (0x7fb0e06284): 0x1
Value at address 0x2560284 (0x7f89e2e284): 0x1
Value at address 0x2570284 (0x7fb6fdf284): 0x2

In the dmesg (attached), however, I see that the system tries to enumerate the PCIe and then gives up:

dmesg | grep pcie
[    0.462503] iommu: Adding device 10003000.pcie-controller to group 49
[    0.462519] arm-smmu: forcing sodev map for 10003000.pcie-controller
[    0.939672] tegra-pcie 10003000.pcie-controller: 2x1, 1x1, 1x1 configuration
[    0.940595] tegra-pcie 10003000.pcie-controller: PCIE: Enable power rails
[    0.940920] tegra-pcie 10003000.pcie-controller: probing port 0, using 2 lanes
[    0.943341] tegra-pcie 10003000.pcie-controller: probing port 1, using 1 lanes
[    1.368966] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[    1.775024] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[    2.177066] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[    2.179080] tegra-pcie 10003000.pcie-controller: link 0 down, ignoring
[    2.579107] tegra-pcie 10003000.pcie-controller: link 1 down, retrying
[    2.981149] tegra-pcie 10003000.pcie-controller: link 1 down, retrying
[    3.382952] tegra-pcie 10003000.pcie-controller: link 1 down, retrying
[    3.384965] tegra-pcie 10003000.pcie-controller: link 1 down, ignoring
[    3.589820] tegra-pcie 10003000.pcie-controller: PCIE: no end points detected
[    3.590088] tegra-pcie 10003000.pcie-controller: PCIE: Disable power rails

While this is ok for the x2 link (we have nothing connected to it, for the moment), the other one has a LAN7431 chip connected to it. The design of this connection has been verified by two hw engineers and by an engineer from Microchip itself, and probing it with an oscilloscope shows that the resets/clocks/power lines are ok. We also see the enumerating signals, but apparently Linux is not satisfied by what it sees.

Is there anything else I should add in the device tree to make it work, or is this an hardware problem?

Thanks and have a nice day!
Rob

decompiled.dts.txt (482.0 KB)
bootlog.txt (19.8 KB)
dmesg.txt (53.2 KB)

Ok, apparently we do have an issue with the connector, the PCIe TX line is broken – we do see the messages on the RX line, but we have no continuity on the TX pins (between the Jetson’s connector and the pins on the LAN7431 IC) (and, obviously, no data passes).
Hw fault then!

I’ll read again, this time thoroughly, the USB configuration part (I just had a quick glance, since I was focused on the PCIe part).

In the meantime, do you have any insight about why I do get the following errors (log files from the previous post):
a) in U-Boot output, if I have no SD card inserted, I get
Card did not respond to voltage select!
However, if I insert one, I get a message saying that no card is present…

b) In Linux logs:

[    2.327398] mmc1: CMD CRC or end bit error, int mask 0xc0000
[    2.333549] mmc1: CMD CRC or end bit error, int mask 0x40000
[    2.339683] mmc1: CMD CRC or end bit error, int mask 0x40000
[    2.345812] mmc1: CMD CRC or end bit error, int mask 0x40000
[    2.351963] mmc1: CMD CRC or end bit error, int mask 0x40000
[    2.358162] mmc1: CMD CRC or end bit error, int mask 0xc0000
[    2.364317] mmc1: CMD CRC or end bit error, int mask 0xc0000
[    2.370477] mmc1: CMD CRC or end bit error, int mask 0xc0000
[    2.376626] mmc1: CMD CRC or end bit error, int mask 0xc0000
[    2.382824] mmc1: CMD CRC or end bit error, int mask 0x40000

?
Thanks again!
Rob