Cannot boot up to Desktop when CONFIG_NET_DSA is on

Hi,

We are trying to bring up a Marvell switch on our customized board. We make a change like the following diff in arch/arm64/configs/tegra_defconfig and rebuild the kernel Image file.

--- a/arch/arm64/configs/tegra_defconfig
+++ b/arch/arm64/configs/tegra_defconfig
@@ -213,6 +213,7 @@ CONFIG_IP6_NF_NAT=m
 CONFIG_IP6_NF_TARGET_MASQUERADE=m
 CONFIG_BRIDGE=y
 CONFIG_BRIDGE_VLAN_FILTERING=y
+CONFIG_NET_DSA=y
 CONFIG_VLAN_8021Q=m
 CONFIG_VLAN_8021Q_GVRP=y
 CONFIG_VLAN_8021Q_MVRP=y

However, after we replace the kernel Image file under /boot and reboot system, system cannot boot up to Desktop and repeat the following display error dmesg. Xavier NX Dev Kit has the same issue with similar error log too.

[   12.286366] tegradc 15200000.nvdisplay: blank - powerdown
[   12.332202] extcon-disp-state external-connection:disp-state: cable 47 state 0
[   12.332205] Extcon AUX1(HDMI) disable
[   12.354408] tegra_nvdisp_handle_pd_disable: Powergated Head1 pd
[   12.354892] tegra_nvdisp_handle_pd_disable: Powergated Head0 pd
[   12.355063] tegradc 15200000.nvdisplay: unblank
[   12.355710] tegra_nvdisp_handle_pd_enable: Unpowergated Head0 pd
[   12.355813] tegra_nvdisp_handle_pd_enable: Unpowergated Head1 pd
[   12.357826] Parent Clock set for DC plld2
[   12.361702] tegradc 15200000.nvdisplay: hdmi: tmds rate:154000K prod-setting:prod_c_hdmi_111m_223m
[   12.362999] tegradc 15200000.nvdisplay: hdmi: get YCC quant from EDID.
[   12.397458] extcon-disp-state external-connection:disp-state: cable 47 state 1
[   12.397462] Extcon AUX1(HDMI) enable
[   12.413193] tegradc 15200000.nvdisplay: sync windows ret = 246

I attach the dmesg log for your reference. Please help identify if this is a known issue on Xavier NX.
dmesg_dsa_on_boot_failed.txt (111.2 KB)

This could be an issue of how the kernel was installed. When you built your new kernel, did you start with a valid initial configuration? One config which is valid is the tegra_defconfig. Another is a copy of the original running system’s “/proc/config.gz”. However, in both cases you would have to make sure your “CONFIG_LOCALVERSION” is correctly set, otherwise you must install all kernel modules too.

In more detail, CONFIG_LOCALVERSION is the suffix you will see in the command “uname -r”. Modules are searched for at “/lib/modules/$(uname -r)/kernel”. If you change the “uname -r” (for example, by omission), then modules will not be found.

Another detail is that if you start with the same configuration, and you only add an option, then most likely you don’t need to rebuild all modules. Some configuration changes though are more invasive, and you would actually want a new “CONFIG_LOCALVERSION” so that you can install new modules for that specific release.

When you do build a new feature into a kernel without it being a module be sure to watch and see if any other features are forcibly enabled. If some other feature is forcibly enabled, and if that feature is in the form of a module, then you’d need to install the new module.

Hello linuxdev,

Thank you for your help. We make sure we made proper settings for tegra_defconfig and CONFIG_LOCALVERSION.

After replacing all the kernel modules with the new built ones, we can boot up the device. It turns out that the CONFIG_NET_DSA has sort of dependency with some kernel modules. One critical problem of the boot failure is that none of preset kernel modules can be loaded with the new built kernel Image file.

Thank you again for your analysis about this issue.