Linux warm reboot

Hello all,

I am currently trying to do a warm reboot of Linux without powering down the board. When I arrive to u-boot code, it is not able to detect the SDMMC4 to download again the kernel Image. We have the following messages:

Card doesn't support part_switch
MMC partition switch failed
MMC partition switch failedtegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, ignoring
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
Hit any key to stop autoboot:  0
MMC: no card present
switch to partitions #0, OK
mmc0 is current device
** No partition table - mmc 0 **
starting USB...
USB0:   USB EHCI 1.10
scanning bus 0 for devices... 1 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found
       scanning usb for ethernet devices... 0 Ethernet Device(s) found

USB device 0: unknown device
No ethernet found.
missing environment variable: pxeuuid
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/00000000
No ethernet found.

After some investigations, we find that the Linux kernel disables the power domain corresponding to the SDMMC4 during its first boot, therefore, u-boot cannot detect the SDMMC4 when we perform the warm reboot. We followed the procedure in the NVIDIA Technical Reference Manual ( 32.7.2.4 SDMMC4 Initialization Sequence) to reset the SDMMC but it seems that this operation is not enough to enable again the corresponding power domain.

So I would know if there is a mean to disable the power domain management in Linux in order to prevent the switch-off of SDMMC4 power domain?Otherwise, could you point to us where we can find the process to enable the SDMMC4 power domain?

Are you using custom board or Jetson TX1.
Which codeline?

We dont see this locally. Your error log says, partition table not present. Are you checking the right sdmmc in uboot?

mmc0 is current device
** No partition table - mmc 0 **
starting USB…

regards
Bibek

Hello,

I fear I was not clear. The first boot (u-boot + Linux) executes as expected by using the eMMC, but when I try to do a warm reboot by jumping to u-boot entry-point, the SDMMC4 is not seen anymore.

I am using the Jetson TX1 along with Jetpack 2.2.1.

Regards,
Pierre

Hi Pierre,

Can you explain a bit what do you mean by “do a warm reboot by jumping to u-boot entry-point”

By warmboot, I mean, reboot command from target console.
Please share complete boot log which will show exactly what you are trying to do.

regards
Bibek

Hi Pierre,

Have this issue been clarified and resolved?
Any further update?

Thanks

Hello, we are currently running Linux on top of a monitor layer following the ARM Trusted Firmware model. One feature of this monitor is related to the reboot of Linux whithout performing an hardware reset. The monitor layer has to restore the context (e.g., peripherals, CPU state, software binary, etc) in order to allow the Linux reboot then it jumps again to the entry-point of u-boot.

If we use the Linux release provided by NVIDIA, u-boot stops its execution at the beginning:
TEGRA210
Model: NVIDIA P2371-2180
DRAM:  4 GiB
MC:   Tegra SD/MMC: 0, Tegra SD/MMC: 1

After some investigations, we found that the issue is due to the SDMMC4 configuration which is probably changed during the first boot of Linux and cannot be used again if no context restoring operations are performed by the monitor layer. We tried to implement the reset sequence of the SDMMC4 by following the Technical Reference Manual but no changes.

In the current version, we boot Linux with success through NFS by removing all instances of SDMMC4 in the device tree. U-boot executes with no crash but is no longer able to detect the SDMMC card as seen in the following log:

TEGRA210
Model: NVIDIA P2371-2180
DRAM:  4 GiB
MC:   Tegra SD/MMC: 0, Tegra SD/MMC: 1
Card doesn't support part_switch
MMC partition switch failed
MMC partition switch failedtegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, ignoring
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
Hit any key to stop autoboot:  0 
MMC: no card present
switch to partitions #0, OK
mmc0 is current device
** No partition table - mmc 0 **
starting USB...
USB0:   USB EHCI 1.10
scanning bus 0 for devices... 1 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found
       scanning usb for ethernet devices... 0 Ethernet Device(s) found

USB device 0: unknown device
No ethernet found.
missing environment variable: pxeuuid
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/00000000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/0000000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/000000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/00000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/0000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/00
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/0
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/default-arm-tegra210
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/default-arm
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/default
No ethernet found.
Config file not found
No ethernet found.
Tegra210 (P2371-2180) #

Hi Pierre_Lucas,

According to the present log, it seems, the so called statement, “The monitor layer has to restore the context “ is not restoring the clocks and therefore sdmmc is not recognized.
Sdmmc needs this clock pll_c4_out2

If you could share what and how you’re doing restore, then we can have more info and suggestion to you.

Thanks

Hello,

Regarding the context restoring of the clock for SDMMC, the monitor layer performs the following steps:

  • Write 0x00041401 to CLK_RST_CONTROLLER_PLLDP_BASE_0
  • Write 0x00041401 to CLK_RST_CONTROLLER_PLLDP_BASE_0
  • Write 0x40000000 to CLK_RST_CONTROLLER_PLLDP_MISC_0
  • Write 0x10000000 to CLK_RST_CONTROLLER_PLLDP_SS_CFG_0
  • Write 0x00000000 to CLK_RST_CONTROLLER_PLLDP_SS_CTRL1_0 and CLK_RST_CONTROLLER_PLLDP_SS_CTRL2_0
  • Write 0x000C2302 to CLK_RST_CONTROLLER_PLLC4_BASE_0

This configuration has been extracted from the NVIDIA Technical Reference Manual.

However there is a probability that others issues will occur once this problem is solved. Could it be possible to have a list of peripherals which might need to be reset?

Thanks

Hi Pierre, there is a clocking sourcing figure in page 56 of TRM, you can find out the related peripherals to PLLDP and PLLC4.

Also, make sure that required regulators are ON.
Looking at the issue it looks like, even in case of false reboot, linux kernel is shutting down the controllers , thus killing the power and clock. As a result, the IOs are not responding. FOr example, sdmmc, xusb ( ethernet is on xusb) etc.

Either you make sure that the shutdown is not performed as part of reboot or you make the HW initialized properly as done in a proper warm boot.

regards
Bibek

Hello,

First of all, thanks for taking your time.

@Trumany, after checking the clocking sourcing figure in page 56, it seems that the PLLC4 uses the PLLP. I restored the registers CLK_RST_CONTROLLER_PLLP_BASE_0 to CLK_RST_CONTROLLER_PLLP_MISC_0 as I previously did with the PLLDP. However, I don’t have any improvement (same log form u-boot), I guess that I will also need to verify if the SDMMC4 needs to be powered on again.

@bbasu, I don’t know how to ensure that the SDMMC4 is correctly powered on. In page 430, the TRM defines the different SoC rail partitions but I was not able to find the partition used by the SDMMC4. Could you point to me the correct sequence to reset properly the SDMMC4 partition?
From my investigation, it seems that Linux is shutting down the SDMMC4 power/clocks during its boot. Indeed, I am able to restart Linux (although it fails quickly) by removing all instances of SDMMC4 in the DTB and booting through NFS.

Hi Pierre,

I believe sdmmc initialization is failing in uboot and execution has not reached kernel. If you want to know what all needed to be done for sdmmc initialization, then just check sdmmc node and pinmux drive node in kernel DT. regulator are vddio_sdmmc and vddio_sd_slot. Check dtsi files to power on the regulator for these sw regulators. Clocks are pll_p", "pll_c4_out2.

sdhci@700b0600 {
compatible = “nvidia,tegra210-sdhci”;
reg = <0x00000000 0x00000003 0x0000001f 0x00000250>;
interrupts = <0x00000000 0x00000008 0x00000003>;
iommus = <0x00000046 0x00002099>;
nvidia,runtime-pm-type = <0x00000001>;
status = “okay”;
vddio_sdmmc-supply = <0x00000056>;
vddio_sd_slot-supply = <0x0000005b>;
tap-delay = <0x00000000>;
trim-delay = <0x00000008>;
nvidia,is-ddr-tap-delay;
nvidia,ddr-tap-delay = <0x00000000>;
mmc-ocr-mask = <0x00000000>;
dqs-trim-delay = <0x00000028>;
dqs-trim-delay-hs533 = <0x00000018>;
max-clk-limit = <0x0bebc200>;
bus-width = <0x00000008>;
built-in;
calib-3v3-offsets = <0x00000505>;
calib-1v8-offsets = <0x00000505>;
compad-vref-3v3 = <0x00000007>;
compad-vref-1v8 = <0x00000007>;
nvidia,en-io-trim-volt;
nvidia,is-emmc;
nvidia,enable-cq;
pll_source = “pll_p”, “pll_c4_out2”;
uhs-mask = <0x00000000>;
power-off-rail;
};

regards
Bibek

Thank you for your answers, I will take a look at it next week and come back to you then.

Regards,
Pierre

Hello all,

I finally had time to investigate more on the issue.

In the linux kernel source code, I found in the drivers/mmc/host/sdhci-tegra.c file code which I think, is related to the regulators vddio_sdmmc and vddio_sd_slot.

case CONFIG_REG_DIS:
		if (tegra_host->is_rail_enabled) {
			if (tegra_host->vdd_io_reg) {
				vddio_prev = regulator_get_voltage(
						tegra_host->vdd_io_reg);
				if (vddio_prev > SDHOST_LOW_VOLT_MAX)
					tegra_sdhci_signal_voltage_switch(
						sdhci, MMC_SIGNAL_VOLTAGE_180);
			}
			if (tegra_host->vdd_io_reg)
				rc = regulator_disable(tegra_host->vdd_io_reg);
			if (tegra_host->vdd_slot_reg)
				rc = regulator_disable(
					tegra_host->vdd_slot_reg);
			tegra_host->is_rail_enabled = false;
                 }
	break;

This code is called by the function tegra_sdhci_ios_config_exit(). I commented all the code in this function ( it was also calling tegra_sdhci_set_clock(sdhci, 0) which I assumed is turning off the clocks).

Even like this, u-boot crashes at reboot. I looked with a debugger and the memory region after 0x700b0600 (which is SDMMC4) cannot be read and any access causes the board to crash.

The only idea that comes to mind is that even though I prevented the clock to be disabled, maybe the source of this clock is disabled later on.

My question is, do you have any others ideas on what could prevent me to read the SDMMC4 memory region?

after system boots, read the register once to confirm that you are reading the right register.
I guess, you should just stub the remove and shutdown function and check once

    .remove         = sdhci_tegra_remove,
    .shutdown       = sdhci_tegra_shutdown,