Intermittent sdcard error

Hi
My Nano has become quite unstable lately. It fails with something like this

[ 2267.285731] mmc0: Data CRC error
[ 2267.288959] sdhci: =========== REGISTER DUMP (mmc0)===========
[ 2267.294781] sdhci: Sys addr: 0x00000008 | Version:  0x00000303
[ 2267.300602] sdhci: Blk size: 0x00007200 | Blk cnt:  0x00000007
[ 2267.306426] sdhci: Argument: 0x0148dae0 | Trn mode: 0x0000003b
[ 2267.312250] sdhci: Present:  0x01fb0008 | Host ctl: 0x00000017
[ 2267.318072] sdhci: Power:    0x00000001 | Blk gap:  0x00000000
[ 2267.323895] sdhci: Wake-up:  0x00000000 | Clock:    0x00000007
[ 2267.329722] sdhci: Timeout:  0x0000000e | Int stat: 0x00001000
[ 2267.335548] sdhci: Int enab: 0x02ff100b | Sig enab: 0x02fc100b
[ 2267.341374] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000
[ 2267.347203] sdhci: Caps:     0x376cd08c | Caps_1:   0x10006f73
[ 2267.353031] sdhci: Cmd:      0x0000123a | Max curr: 0x00000000
[ 2267.358861] sdhci: Host ctl2: 0x0000308b
[ 2267.362791] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000ffefe410
[ 2267.369342] sdhci: ===========================================
[ 2280.034801] tegra-i2c 7000d000.i2c: pio timed out addr: 0x3c tlen:28 rlen:4
[ 2280.042173] tegra-i2c 7000d000.i2c: --- register dump for debugging ----
[ 2280.050262] tegra-i2c 7000d000.i2c: I2C_CNFG - 0x22c00
[ 2280.056157] tegra-i2c 7000d000.i2c: I2C_PACKET_TRANSFER_STATUS - 0x1010001
[ 2280.063412] tegra-i2c 7000d000.i2c: I2C_FIFO_CONTROL - 0xe0
[ 2280.069617] tegra-i2c 7000d000.i2c: I2C_FIFO_STATUS - 0x800081
[ 2280.075887] tegra-i2c 7000d000.i2c: I2C_INT_MASK - 0x7d
[ 2280.081332] tegra-i2c 7000d000.i2c: I2C_INT_STATUS - 0xc3
[ 2280.087066] tegra-i2c 7000d000.i2c: i2c transfer timed out addr: 0x3c
[ 2280.093992] max77620-thermal max77620-thermal: Failed to read STATLBT: -110
[ 2288.294669] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2288.294722] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 2288.294797] 	0-...: (1 GPs behind) idle=2e7/2/0 softirq=308297/308299 fqs=2533 
[ 2288.294846] 	(detected by 3, t=5252 jiffies, g=42925, c=42924, q=5)
[ 2288.319607] 	0-...: (1 GPs behind) idle=2e7/2/0 softirq=308298/308299 fqs=2119 
[ 2288.326960] 	(detected by 1, t=5260 jiffies, g=126700, c=126699, q=283)
[ 2292.558520] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[ 2292.565857] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.9.253-tegra #1
[ 2292.572440] Hardware name: NVIDIA Jetson Nano Developer Kit (DT)
[ 2292.578494] Call trace:
[ 2292.581039] [<ffffff800808ba40>] dump_backtrace+0x0/0x198
[ 2292.586512] [<ffffff800808c004>] show_stack+0x24/0x30
[ 2292.591636] [<ffffff8008f6121c>] dump_stack+0xa0/0xc4
[ 2292.596754] [<ffffff8008f5e2c0>] panic+0x12c/0x2a8
[ 2292.601627] [<ffffff8008181994>] watchdog_check_hardlockup_other_cpu+0x11c/0x120
[ 2292.609085] [<ffffff8008180b08>] watchdog_timer_fn+0x98/0x2c0
[ 2292.614892] [<ffffff8008138f10>] __hrtimer_run_queues+0xd8/0x360
[ 2292.620954] [<ffffff8008139860>] hrtimer_interrupt+0xa8/0x1e0
[ 2292.626758] [<ffffff8008bfa838>] tegra210_timer_isr+0x38/0x48
[ 2292.632563] [<ffffff8008121940>] __handle_irq_event_percpu+0x68/0x288
[ 2292.639059] [<ffffff8008121b88>] handle_irq_event_percpu+0x28/0x60
[ 2292.645294] [<ffffff8008121c10>] handle_irq_event+0x50/0x80
[ 2292.650930] [<ffffff8008125aa4>] handle_fasteoi_irq+0xd4/0x1c0
[ 2292.656817] [<ffffff80081208f4>] generic_handle_irq+0x34/0x50
[ 2292.662618] [<ffffff8008120fe0>] __handle_domain_irq+0x68/0xc0
[ 2292.668504] [<ffffff8008080d44>] gic_handle_irq+0x5c/0xb0
[ 2292.673958] [<ffffff8008082c28>] el1_irq+0xe8/0x194
[ 2292.678906] [<ffffff8008ba09e0>] cpuidle_enter_state+0xb8/0x380
[ 2292.684892] [<ffffff8008ba0d1c>] cpuidle_enter+0x34/0x48
[ 2292.690261] [<ffffff800811139c>] call_cpuidle+0x44/0x70
[ 2292.695539] [<ffffff8008111718>] cpu_startup_entry+0x1b0/0x200
[ 2292.701436] [<ffffff8008091cf8>] secondary_start_kernel+0x190/0x1f8
[ 2292.707749] [<0000000084f6e1a8>] 0x84f6e1a8
[ 2292.711996] SMP: stopping secondary CPUs
[ 2292.716283] Kernel Offset: disabled
[ 2292.719837] Memory Limit: none
[ 2292.824501] Rebooting in 5 seconds..

When it works, I can dd all partitions without any error

root@jet:/home/user# for f in /dev/mmcblk0*; do dd bs=10M if=$f of=/dev/null; done
12211+1 records in
12211+1 records out
128043712512 bytes (128 GB, 119 GiB) copied, 1413,09 s, 90,6 MB/s
4000+0 records in
4000+0 records out
41943040000 bytes (42 GB, 39 GiB) copied, 463,201 s, 90,6 MB/s
0+1 records in
0+1 records out
458752 bytes (459 kB, 448 KiB) copied, 0,00852792 s, 53,8 MB/s
0+1 records in
0+1 records out
786432 bytes (786 kB, 768 KiB) copied, 0,0150839 s, 52,1 MB/s
0+1 records in
0+1 records out
65536 bytes (66 kB, 64 KiB) copied, 0,00211823 s, 30,9 MB/s
0+1 records in
0+1 records out
196608 bytes (197 kB, 192 KiB) copied, 0,00364618 s, 53,9 MB/s
0+1 records in
0+1 records out
131072 bytes (131 kB, 128 KiB) copied, 0,0024137 s, 54,3 MB/s
0+1 records in
0+1 records out
131072 bytes (131 kB, 128 KiB) copied, 0,00296715 s, 44,2 MB/s
0+1 records in
0+1 records out
458752 bytes (459 kB, 448 KiB) copied, 0,0069534 s, 66,0 MB/s
0+1 records in
0+1 records out
589824 bytes (590 kB, 576 KiB) copied, 0,00907845 s, 65,0 MB/s
0+1 records in
0+1 records out
65536 bytes (66 kB, 64 KiB) copied, 0,00181348 s, 36,1 MB/s
0+1 records in
0+1 records out
196608 bytes (197 kB, 192 KiB) copied, 0,00366681 s, 53,6 MB/s
0+1 records in
0+1 records out
393216 bytes (393 kB, 384 KiB) copied, 0,00600041 s, 65,5 MB/s
0+1 records in
0+1 records out
65536 bytes (66 kB, 64 KiB) copied, 0,00153623 s, 42,7 MB/s
0+1 records in
0+1 records out
458752 bytes (459 kB, 448 KiB) copied, 0,00681788 s, 67,3 MB/s

The card is a Samsung Endurance 128GB
The board is powered through the barrel, from a modified 2.5A Raspberry power supply (i have shortened the cable and replaced the USB connector with a jack); the supply voltage is usually above 5V

cat /sys/bus/i2c/drivers/ina3221x/6-0040/iio\:device0/in_voltage0_input
5216

I’ve tried two other cards (generic UHS 64GB) with similar results

Do you have any ideas?

Looks like hardware defect. Do you have other jetson nano to validate?

No, I have only one Jetson board; the errors seem to disappear when the sdcard is booted through USB, so maybe the carrier board is fine
(there is another quirk, but I don’t believe it is related: the USB adapter is no longer recognized after a shutdown and in order to get it recognized again, it needs to be reinserted while the board is powered)
console_UsbReinsertRequired.log (23.1 KB)

mmc0: Data CRC error and sdhci driver are from the sdcard, so booting from usb is probably just escaping the error.

But it does not indicate your sdcard slot is fine.

i’m using this case [https://www.seeedstudio.com/Jetson-Nano-B01-Metal-Armour-Case-with-PWM-Adjustment-Fan-p-4557.html] and the sdcard is connected through an adapter. I will take the board out this evening, to see what happens when the sdcard is inserted directly into the module

Yeah, it looks like the culprit was a bad electrical connection.
The module’s sdcard slot latching mechanism does not work anymore and the sdcard adapter ended up being held in place only from the case.
For now I have secured the adapter to the module with some tape and the errors do not show anymore. Is this thing covered under warranty?

Sorry that I am not responsible for the warranty so cannot answer.

You can try to file a RMA request.

just an update
the RMA was approved, but the carrier selected by NVIDIA to return the board to NVIDIA would not collect the board from me in order to to return it to NVIDIA
Using another carrier at my own cost does not make sense, as the transportation cost would be about $60, more than 1/2 of the price of a new board
I will make a few more attempts to contact the carrier, but tbh this is becoming a time black hole

update2
the RMA process is broken and the RMA team is unable to fix it. They insist on using FedEx to return my board, while FedEx keeps being nonexistent/inactive in my country.
I’ve wasted way too many hours listening to automated phone messages
If i knew beforehand how the warranty thing was going to proceed, i would have ran from it in an instant

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.