mmc1 timeout on TX1 [OS jetpack 2.3.1]

Hello all,

As the title suggests: I am having this error with low probability. It did not always show up, but when it showed up, it was always the last a few messages I could see in /var/log/kern.log before a manual system reboot. BTW, the OS uses extlinux to boot.
Here is the error messages:

Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.272152] sdhci: ================== REGISTER DUMP (mmc1)==================
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.279250] sdhci: Sys addr[0x000]: 0x00000000 | Version[0x0fe]:  0x00000303
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.286322] sdhci: Blk size[0x004]: 0x00007080 | Blk cnt[0x006]:  0x00000000
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.293392] sdhci: Argument[0x008]: 0x12003e00 | Trn mode[0x00c]: 0x00000000
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.300463] sdhci: Present[0x024]:  0x01fb00f0 | Host ctl[0x028]: 0x00000013
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.307530] sdhci: Power[0x029]:    0x0000000d | Blk gap[0x02a]:  0x00000000
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.314596] sdhci: Wake-up[0x02b]:  0x00000000 | Clock[0x02c]:    0x00000007
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.321661] sdhci: Timeout[0x02e]:  0x0000000e | Int stat[0x030]: 0x00000000
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.328726] sdhci: Int enab[0x034]: 0x02ff000b | Sig enab[0x038]: 0x02fc000b
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.335791] sdhci: AC12 err[0x03c]: 0x00000000 | Slot int[0x0fc]: 0x00000000
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.342861] sdhci: Caps[0x040]:     0x376cd08c | Caps_1[0x044]:   0x10006f73
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.349927] sdhci: Cmd[0x00e]:      0x0000341a | Max curr[0x048]: 0x00000000
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.356987] sdhci: Host ctl2[0x03e]: 0x0000300b
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.361543] sdhci: ADMA Err[0x054]: 0x00000000 | ADMA Ptr[0x058]: 0xfdd00010
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368613] mmc1: tuning_window[0]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368639] mmc1: tuning_window[1]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368665] mmc1: tuning_window[2]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368688] mmc1: tuning_window[3]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368711] mmc1: tuning_window[4]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368734] mmc1: tuning_window[5]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368756] mmc1: tuning_window[6]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368779] mmc1: tuning_window[7]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368799] sdhci: Tap value: 48 | Trim value: 8
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368819] sdhci: SDMMC Interrupt status: 0x00040000
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368833] sdhci: =========================================================
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.375896] mmc1: Command CRC error, intmask: 60001 Interface clock = 204000000Hz

I am aware of posts like https://devtalk.nvidia.com/default/topic/954715/jetson-tk1/mmc2-timeout-waiting-for-hardware-interrupt-solved-with-issues-/ and https://devtalk.nvidia.com/default/topic/883906/custom-tegra-k1-board-mmc-problem-/ , but I still don’t know how to address this issue, especially when it does not always happen. LOL. And, I am a newbie on this topic.

A full log is pasted in https://pastebin.com/MB6TQWAz

Here is a grep I did from the full log:

~$ grep -i 'mmc1' one_log.txt 
Aug 14 16:38:21 tegra-ubuntu kernel: [    0.269090] vddio-sdmmc1: 1800 <--> 3300 mV at 3300 mV with ramp delay 100000 uV/us ; Rail ON
Aug 14 16:38:21 tegra-ubuntu kernel: [    0.331485] tegra210-pmc-iopower pmc-iopower.29: Rail iopower-sdmmc1 is having voltages: 1800000:3300000
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.471365] mmc1: sdhci_tegra_probe line=5737 runtime pm type=mmc rtpm coupled with clock gate, disable-clock-gate=0
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.487997] mmc1: no vqmmc regulator found
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.494048] mmc1: no vmmc regulator found
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.531185] mmc1: SDHCI controller on sdhci-tegra.1 [sdhci-tegra.1] using ADMA
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.758235] mmc1: queuing unknown CIS tuple 0x80 (5 bytes)
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.952066] mmc1: tap value and tuning window after hw tuning completion ...
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.952076] mmc1: tuning_window[0]: 0x8fffffff
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.952084] mmc1: tuning_window[1]: 0xffffffff
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.952091] mmc1: tuning_window[2]: 0xffffffc7
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.952097] mmc1: tuning_window[3]: 0x7fffe3ff
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.952104] mmc1: tuning_window[4]: 0x0
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.952111] mmc1: tuning_window[5]: 0x0
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.952118] mmc1: tuning_window[6]: 0x0
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.952124] mmc1: tuning_window[7]: 0x0
Aug 14 16:38:21 tegra-ubuntu kernel: [    1.963690] mmc1: new ultra high speed SDR104 SDIO card at address 0001
Aug 14 16:38:21 tegra-ubuntu kernel: [    2.438497]    sdmmc1_ddr
Aug 14 16:38:21 tegra-ubuntu NetworkManager[548]: <info>  [1502728701.4441] rfkill1: found WiFi radio killswitch (at /sys/devices/platform/sdhci-tegra.1/mmc_host/mmc1/mmc1:0001/mmc1:0001:2/ieee80211/phy0/rfkill1) (driver bcmsdh_sdmmc)
Aug 14 16:38:21 tegra-ubuntu NetworkManager[548]: <info>  [1502728701.4452] rfkill2: found WiFi radio killswitch (at /sys/devices/platform/sdhci-tegra.1/mmc_host/mmc1/mmc1:0001/mmc1:0001:2/rfkill/rfkill2) (driver bcmsdh_sdmmc)
Aug 14 16:38:21 tegra-ubuntu NetworkManager[548]: <info>  [1502728701.7048] devices added (path: /sys/devices/platform/sdhci-tegra.1/mmc_host/mmc1/mmc1:0001/mmc1:0001:2/net/wlan0, iface: wlan0)
Aug 14 16:38:21 tegra-ubuntu NetworkManager[548]: <info>  [1502728701.7049] device added (path: /sys/devices/platform/sdhci-tegra.1/mmc_host/mmc1/mmc1:0001/mmc1:0001:2/net/wlan0, iface: wlan0): no ifupdown configuration found.
Aug 14 16:38:22 tegra-ubuntu kernel: [    5.539028] mmc1: queuing unknown CIS tuple 0x80 (5 bytes)
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.272152] sdhci: ================== REGISTER DUMP (mmc1)==================
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368613] mmc1: tuning_window[0]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368639] mmc1: tuning_window[1]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368665] mmc1: tuning_window[2]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368688] mmc1: tuning_window[3]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368711] mmc1: tuning_window[4]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368734] mmc1: tuning_window[5]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368756] mmc1: tuning_window[6]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.368779] mmc1: tuning_window[7]: 0x0
Aug 14 17:07:03 tegra-ubuntu kernel: [ 1726.375896] mmc1: Command CRC error, intmask: 60001 Interface clock = 204000000Hz

After flashed jetpack 2.3.1 to TX1, I recompiled the kernel following instructions on this page: http://www.jetsonhacks.com/2016/09/28/build-tx1-kernel-and-modules-nvidia-jetson-tx1/ in order to enable SocketCAN support. I later added gpio-to-sfio in gpio@6000d000 to the device tree in order to use spi0 for CAN traffic.
Here is what I did:

default {
        gpio-to-sfio = <0x10 0x11 0x12 0x13>
    }

My TX1 worked perfectly except it sometimes shuts down itself and never come back unless I manually boot up it again.
Any help is appreciated, thanks!

@lanfker
I don’t think this message cause the system reboot. Could you try run the jetson_clocks.sh to make the fan working to exclude the thermal issue. And figure the repo step is better for debugging.

Hi ShaneCCC,
I believe the module is not over heated. I ran the module indoor, and it can automatically shuts down itself one or two minutes after a fresh boot.

Can I ask what ‘repo step’ is in your reply?

This last log line tends to blame the SD card (not definitive):

mmc1: Command CRC error, intmask: 60001 Interface clock = 204000000Hz

I’ve had success with most SD cards when used for data, but when used for a rootfs I’ve more or less succeeded only with name-brand cards, e.g., SanDisk or Samsung, but all ADATA failed for rootfs. Your SD seems to not be used as a root file system, but I am curious if you still get failures with no SD card installed (or with a different name brand SD card)?

Hi linuxdev,

First, thanks for replying.

I do not have an SD card plugged in my TX1 board, is this causing the timeout problem? If it is, my guess is the system should report time out quite quickly, but not after thousands of seconds.

I had assumed this was from the SD card controller:

mmc1: Command CRC error

If someone from NVIDIA can confirm that this is or is not the SD card controller then there is a clue. I guess a big question is how to reproduce this, but regardless, after a boot, what do you get from “dmesg | egrep -i mmc1”? I’d be very curious what mmc1 is if it is not the SD controller.

I have one ‘grep -i ‘mmc1’ one_log.txt’ in my post. ‘one_log.txt’ is a full log from start to crash. Does it suffice?
If not, I will post “dmesg | egrep -i mmc1” once I get back to work tomorrow. Thanks!

If you don’t have an SD card I think someone still needs to confirm that mmc1 refers to the SD controller. It was mentioned earlier to see what happens when you run “sudo ~/jetson_clocks.sh”…if this changes things or not.

@linuxdev
It’s wifi sdio module.

@lanfker
Could you remove the lib/firmware/brcm/. to check if the mmc1 message still there.

Hi linuxdev, this is the ‘egrep’ from dmesg

ubuntu@tegra-ubuntu:~$ dmesg | egrep -i mmc1
[    0.269153] vddio-sdmmc1: 1800 <--> 3300 mV at 3300 mV with ramp delay 100000 uV/us ; Rail ON
[    0.331366] tegra210-pmc-iopower pmc-iopower.29: Rail iopower-sdmmc1 is having voltages: 1800000:3300000
[    1.491341] mmc1: sdhci_tegra_probe line=5737 runtime pm type=mmc rtpm coupled with clock gate, disable-clock-gate=0
[    1.508221] mmc1: no vqmmc regulator found
[    1.514237] mmc1: no vmmc regulator found
[    1.551122] mmc1: SDHCI controller on sdhci-tegra.1 [sdhci-tegra.1] using ADMA
[    1.777900] mmc1: queuing unknown CIS tuple 0x80 (5 bytes)
[    1.971656] mmc1: tap value and tuning window after hw tuning completion ...
[    1.971668] mmc1: tuning_window[0]: 0xcfffffff
[    1.971676] mmc1: tuning_window[1]: 0xffffffff
[    1.971684] mmc1: tuning_window[2]: 0xffffffc7
[    1.971692] mmc1: tuning_window[3]: 0x7ffff3ff
[    1.971699] mmc1: tuning_window[4]: 0x0
[    1.971706] mmc1: tuning_window[5]: 0x0
[    1.971713] mmc1: tuning_window[6]: 0x0
[    1.971720] mmc1: tuning_window[7]: 0x0
[    1.975203] mmc1: new ultra high speed SDR104 SDIO card at address 0001
[    2.493115]    sdmmc1_ddr
[    5.768847] mmc1: queuing unknown CIS tuple 0x80 (5 bytes)
ubuntu@tegra-ubuntu:~$

Hi ShaneCCC,

I have followed your instructions and moved everything under lib/firmware/brcm/ to my home folder for backup. If this is caused by WiFi firmware, does the latest tegra ubuntu OS still have the issue or it is solved? My work on my TX1 needs both Ethernet and WiFi.

Since the error happens at quite low probability, I will have to spend time observing logs before I can confirm if the error outputs are gone. I do not know how to reproduce this error.

@lanfker
Does the timeout message still show up and wifi function didn’t working after remove the bin file?

Hi lanfker,

Have you clarified the cause and resolved the problem?
Any result can be shared?

Thanks

Hi, Kayccc,
I did not solve the problem due to its low occurrence, and much of my time was allocated to other tasks. I will keep monitoring this issue till I solve it. If I manage to get this solved, I will certainly share my solution.