mmc2: Timeout waiting for hardware interrupt. (SOLVED with issues)

Was this mmc2 issue ever addressed? https://devtalk.nvidia.com/default/topic/913042/jetson-tk1/ubuntu-automatic-login-ok-but-crash-on-mmc2-timeout-waiting-for-hardware-interrupt-/

I am getting this same exact error RIGHT before my system becomes unresponsive due to the fact that my OS is running on internal mmc.

130|shell@armv7_cortex_a:/ $ [   37.878633] mmc2: Timeout waiting for hardware interrupt.
[   37.884023] sdhci: ================== REGISTER DUMP (mmc2)==================
[   37.891059] sdhci: Sys addr[0x000]: 0x00000000 | Version[0x0fe]:  0x00000303
[   37.898095] sdhci: Blk size[0x004]: 0x00000000 | Blk cnt[0x006]:  0x00000000
[   37.905128] sdhci: Argument[0x008]: 0x00000c00 | Trn mode[0x00c]: 0x00000000
[   37.912159] sdhci: Present[0x024]:  0x01fb00f1 | Host ctl[0x028]: 0x00000001
[   37.919191] sdhci: Power[0x029]:    0x0000000f | Blk gap[0x02a]:  0x00000000
[   37.926222] sdhci: Wake-up[0x02b]:  0x00000000 | Clock[0x02c]:    0x00000405
[   37.933253] sdhci: Timeout[0x02e]:  0x00000000 | Int stat[0x030]: 0x00000000
[   37.940284] sdhci: Int enab[0x034]: 0x00ff0003 | Sig enab[0x038]: 0x00fc0003
[   37.947315] sdhci: AC12 err[0x03c]: 0x00000000 | Slot int[0x0fc]: 0x00000000
[   37.954346] sdhci: Caps[0x040]:     0x376fd080 | Caps_1[0x044]:   0x10002f73
[   37.961378] sdhci: Cmd[0x00e]:      0x0000341a | Max curr[0x048]: 0x00000000
[   37.968406] sdhci: Host ctl2[0x03e]: 0x00003000
[   37.972924] sdhci: ADMA Err[0x054]: 0x00000000 | ADMA Ptr[0x058]: 0x00000000
[   37.979953] sdhci: Tap value: 0 | Trim value: 2
[   37.984469] sdhci: SDMMC Interrupt status: 0x00000000
[   37.989503] sdhci: =========================================================

Full log on pastebin: http://pastebin.com/6Peqtm45

I was thinking it was my DTB file because of all the dt errors but I am not totally sure.

I am using kernel 3.10.67 modified to run on Jetson TK1 (android) using GCC arm-linux-gnueabi-4.9-linaro

I used the standard L4T flashing method to flash.

Okay so digging into the forums it looks like my board id is not getting passed from u-boot to kernel. What is the process to write board_id to EEPROM?

eeprom write ?

This post ends with the same questions: https://devtalk.nvidia.com/default/topic/883906/custom-tegra-k1-board-mmc-problem-/

Okay digging in a bit more this issue is definitely due to the fact I am using upstream U-Boot 2015.04-00030 that I have made compatible with android fastboot and boota and the upstream fails when I add the serial epprom flags:

/* The following are used to retrieve the board id from an eeprom */
#define CONFIG_SERIAL_EEPROM
#define EEPROM_I2C_BUS         1
#define EEPROM_I2C_ADDRESS     0x56
#define EEPROM_SERIAL_OFFSET   0x04
#define NUM_SERIAL_ID_BYTES    8

compilation error:

LD      u-boot
board/nvidia/common/built-in.o: In function `get_board_serial':
/home/matt/Apalis-TK1/u-boot/board/nvidia/common/../../nvidia/common/board.c:290: undefined reference to `i2c_set_bus_num'
/home/matt/Apalis-TK1/u-boot/board/nvidia/common/../../nvidia/common/board.c:292: undefined reference to `i2c_read'

Why don’t you uncomment the BOARDID line in the jetson-tk1.conf in the linux_for_tegra folder where the flasher is located. That probably should override the problem.

That is one of the very first things I tried without success. You can see this same exact error reported all over the NXP community also but no one has seem to addressed it. The error seems to be associated with the same error one gets with a bad removable sdcard so the code must be in the board-sdhci file or the tegra mmc sdhci driver code. Ive examined the code and I cant see any reference to an mmc2 card the only thing I can think of is the bcmhd driver uses an sdio mmc interface for wifi on some tegra boards so the code is trying to load this and halts the system. I really am stumped on this error…

I’m also struggling with the tegra’s sdhci interface now to enable a wifi, so I can feel you…
It’s really annoying that there’s no support at all for the tk1.

If I think or find something about your issue I will come back.

Also, as I had other problems with sdmmc3 which is used by default from Jetson and it’s hardcoded in the kernel, I had the same timeouts (for another reason of course), so what I’ve did was to move the sdmmc3 pins to the unused section in arch/arm/boot/dts/tegra124-pinmux.dtsi

NICE! I will have to try that! Where would the unused section be? Could you elaborate a bit more?

I see this in tegra124-soc-base.dtsi:

sdhci@700b0600 {
		compatible = "nvidia,tegra124-sdhci";
		power-domains = <&mc_clk_pd>;
		reg = <0x0 0x700b0600 0x0 0x200>;
		interrupts = < 0 31 0x04 >;
		iommus = <&smmu TEGRA_SWGROUP_SDMMC4A>;
		status = "disabled";
	};
	sdhci@700b0400 {
		compatible = "nvidia,tegra124-sdhci";
		power-domains = <&mc_clk_pd>;
		reg = <0x0 0x700b0400 0x0 0x200>;
		interrupts = < 0 19 0x04 >;
		iommus = <&smmu TEGRA_SWGROUP_SDMMC3A>;
		status = "disabled";
	};
	sdhci@700b0200 {
		compatible = "nvidia,tegra124-sdhci";
		power-domains = <&mc_clk_pd>;
		reg = <0x0 0x700b0200 0x0 0x200>;
		interrupts = < 0 15 0x04 >;
		iommus = <&smmu TEGRA_SWGROUP_SDMMC2A>;
		status = "disabled";
	};
	sdhci@700b0000 {
		compatible = "nvidia,tegra124-sdhci";
		power-domains = <&mc_clk_pd>;
		reg = <0x0 0x700b0000 0x0 0x200>;
		interrupts = < 0 14 0x04 >;
		iommus = <&smmu TEGRA_SWGROUP_SDMMC1A>;
		status = "disabled";
	};

so move MMC2:

sdhci@700b0200 {
		compatible = "nvidia,tegra124-sdhci";
		power-domains = <&mc_clk_pd>;
		reg = <0x0 0x700b0200 0x0 0x200>;
		interrupts = < 0 15 0x04 >;
		iommus = <&smmu TEGRA_SWGROUP_SDMMC2A>;
		status = "disabled";
	};

To the unused section?

If I remember right, I’ve just moved in arch/arm/boot/dts/tegra124-pinmux.dtsi file the following pins

sdmmc3_clk_pa6
sdmmc3_cmd_pa7
sdmmc3_dat0_pb7
sdmmc3_dat1_pb6
sdmmc3_dat2_pb5
sdmmc3_dat3_pb4
sdmmc3_clk_lb_out_pee4
sdmmc3_clk_lb_in_pee5
sdmmc3_cd_n_pv2

in the pinmux_unused_lowpower: unused_lowpower {}

Thanks dimtass! You have been a big help and gave me some great ideas! Hopefully I report back sucess!

And yes, sdhci@700b0200 should be ‘status = “disabled”’

Just noticed this also in the upstream 3.10.67 kernel:

Kernel 3.10.67

sdhci@700b0600 {
		compatible = "nvidia,tegra124-sdhci";
		power-domains = <&mc_clk_pd>;
		reg = <0x0 0x700b0600 0x0 0x200>;
		interrupts = < 0 31 0x04 >;
		iommus = <&smmu TEGRA_SWGROUP_SDMMC4A>;
		status = "disabled";
	};
	sdhci@700b0400 {
		compatible = "nvidia,tegra124-sdhci";
		power-domains = <&mc_clk_pd>;
		reg = <0x0 0x700b0400 0x0 0x200>;
		interrupts = < 0 19 0x04 >;
		iommus = <&smmu TEGRA_SWGROUP_SDMMC3A>;
		status = "disabled";
	};
	sdhci@700b0200 {
		compatible = "nvidia,tegra124-sdhci";
		power-domains = <&mc_clk_pd>;
		reg = <0x0 0x700b0200 0x0 0x200>;
		interrupts = < 0 15 0x04 >;
		iommus = <&smmu TEGRA_SWGROUP_SDMMC2A>;
		status = "disabled";
	};
	sdhci@700b0000 {
		compatible = "nvidia,tegra124-sdhci";
		power-domains = <&mc_clk_pd>;
		reg = <0x0 0x700b0000 0x0 0x200>;
		interrupts = < 0 14 0x04 >;
		iommus = <&smmu TEGRA_SWGROUP_SDMMC1A>;
		status = "disabled";
	};

Kernel 3.10.33/40

sdhci@78000000 {
		compatible = "nvidia,tegra114-sdhci", "nvidia,tegra30-sdhci";
		reg = <0x78000000 0x200>;
		interrupts = <0 14 0x04>;
		clocks = <&tegra_car 14>;
		nvidia,memory-clients = <14>;
		status = "disable";
	};

	sdhci@78000200 {
		compatible = "nvidia,tegra114-sdhci", "nvidia,tegra30-sdhci";
		reg = <0x78000200 0x200>;
		interrupts = <0 15 0x04>;
		clocks = <&tegra_car 9>;
		nvidia,memory-clients = <14>;
		status = "disable";
	};

	sdhci@78000400 {
		compatible = "nvidia,tegra114-sdhci", "nvidia,tegra30-sdhci";
		reg = <0x78000400 0x200>;
		interrupts = <0 19 0x04>;
		clocks = <&tegra_car 69>;
		nvidia,memory-clients = <14>;
		status = "disable";
	};

	sdhci@78000600 {
		compatible = "nvidia,tegra114-sdhci", "nvidia,tegra30-sdhci";
		reg = <0x78000600 0x200>;
		interrupts = <0 31 0x04>;
		clocks = <&tegra_car 15>;
		nvidia,memory-clients = <14>;
		status = "disable";
	};

As you can see the interrupts and other settings differ so this could be an issue also… not sure tho im no DTB expert ;)

Perfect I found the pinmux_unused_lowpower: unused_lowpower {} section in a different file. Gonna try a build here in a few…

Which device would correspond to mmc2? I dont se an sdmmc2 only sdmmc1 sdmmc3 and sdmmc4

mmc2 is the device name that the kernel assignes to the interface. So if you only have enable sdmmc3 and sdmmc4, then probably mmc0=sdmmc4 and mmc1=sdmm3. I think that in mmc(x), x is usually the .id of the platform_device in the board file, but I’m not really sure about that.

Thank you immensely for the input and advice dimtass. Trial and error will tell from here on out…

To find which sdhci-tegra interface is assigned to each mmc device, just check the kernel output.
For example in my case I have the following assigns:

[    5.302607] mmc0: SDHCI controller on sdhci-tegra.3 [sdhci-tegra.3] using ADMA
...
[    5.393608] mmc1: SDHCI controller on sdhci-tegra.0 [sdhci-tegra.0] using ADMA
...
[    6.022615] mmc2: SDHCI controller on sdhci-tegra.1 [sdhci-tegra.1] using ADMA
...

Which means that sdhci-tegra.3 -> mmc0, sdhci-tegra.0 -> mmc1 and sdhci-tegra.1 -> mmc2.

Therefore, in my case if mmc2 timeouts then I have to move the sdhci-tegra.1 gpios to the unused pins and of course, disable the corresponding sdhci@78000x00 in the devicetree.

Okay so I noticed all references to sdmmc1 were already set to pinmux_unused_lowpower: unused_lowpower so I moved them to the enabled section just for a test. And in my soc dts includes they are disabled like this:

sdhci@700b0600 {
		status = "disabled";
	};
	sdhci@700b0400 {
		status = "disabled";
	};
	sdhci@700b0200 {
		status = "disabled";
	};
	sdhci@700b0000 {
		status = "disabled";
	};

Then you have the opposite problem, which means that the sdmmc1 is enabled in the board-ardbeg-sdhci.c file (it’s somewhere at the end of the file) and at the same time you’ve set the pins as unused.
In that case the mmc timeouts because is enabled in the board file but the pins are disabled, therefore it timeouts. Either leave the gpios as unused and remove from the board-ardbeg-sdhci.c the line that sdmmc1 is registered or just move the gpios to the non-unused part of the devicetree file. I think the first approach is better.

I don’t have a kernel now, so I couldn’t see the exact code in my previous answer, but now I’ve found a file on the internet, so in arch/arm/mach-tegra/board-ardbeg-sdhci.c you’ll find in function ardbeg_sdhci_init() that there are some calls at the end that register the sdhci platform devices with a function call like

platform_device_register(&tegra_sdhci_device1)

This means that you’ll also have to remove the call that registers the sdmmc device that timeouts in your case. You can do that by searching in the file the struct that belongs to the parameter that is passed in the function. So, you probably have a tegra_sdhci_device1 struct that has the .id = 1.

Now, if you actually remove the platform_device_register(&tegra_sdhci_devicex) line, I think that the kernel doesn’t compile as the struct is declared but not used, so better use something like that:

#ifdef 0
platform_device_register(&tegra_sdhci_device1)
#endif