I am seeing eMMC corruption / instability across multiple Jetson AGX units running from the internal built-in eMMC (no SSD/NVMe attached). I am trying to understand whether this behavior is expected given typical eMMC endurance, or whether there may be an underlying issue worth investigating.
The units have been used in long-running deployments with continuous application activity, logging, and background processes. Over time, several devices became unstable and now reboot or panic during write-heavy operations (package installation, sync, dpkg, etc.) or don’t boot at all.**
Write volume measurement
I collected disk write statistics over a ~24-hour period using cumulative write counters.
Example (one device):
-
Start disk writes: 20.668 MB
-
End disk writes: 8644.556 MB
-
Total writes over ~24 hours: 8623.888 MB
This corresponds to approximately 8.6 GB of host writes per day.
eMMC lifetime estimation
Using a simple endurance estimation model:
Expected lifetime (years) =
((Disk capacity × P/E cycles) / (Daily writes × Write amplification)) / 365
Assumptions:
-
Disk capacity: 32 GB
-
P/E cycles: 1000
-
Write amplification factor: variable (1–4)
Results:
| Write Amplification | Estimated Lifetime (years) |
|---|---|
| 1 | ~10.17 |
| 2 | ~5.08 |
| 3 | ~3.39 |
| 4 | ~2.54 |
Even with moderate write amplification (2–3×), the expected lifetime should still be multiple years, yet I am observing eMMC corruption and boot issues much earlier.
Observed behavior
Across affected units, symptoms include:
-
Sudden reboots or kernel panics during write-intensive operations
-
MMC controller errors reported in kernel logs (e.g., “RED error” events)
-
Failures occurring even after disabling CMDQ / blk-mq and reducing MMC features
-
Inconsistent ability to read eMMC health information (some units allow reading EXT_CSD, others reboot before tools can run)
On at least one unit, mmc-utils reports:
-
LIFE_TIME_EST_TYP_A: 0x01
-
LIFE_TIME_EST_TYP_B: 0x09
-
PRE_EOL_INFO: 0x01
Other units show similar instability but different failure characteristics.
I would like clarification on:
-
Whether the internal eMMC used on Jetson AGX is intended to sustain this amount of writes over multi-year deployments.
-
Typical write amplification factors assumed by NVIDIA for AGX eMMC endurance calculations.
-
Whether these failure signatures are known behaviors of late-life eMMC on Tegra194.
-
Recommended mitigations or design guidance beyond “use external NVMe,” especially for deployed systems.