mdadm RAID over USB on DGX Spark: after reboot disks fall back to USB2 (480 Mbps) → defensive workaround + open question
I wanted to document a recurring issue after reboot with NVMe drives in USB enclosures on DGX Spark, and the practical workaround I ended up implementing to avoid data corruption — in case it helps others, or someone has a cleaner solution.
🧩 Observed problem
This system uses an mdadm RAID array (/dev/md0) mounted at /mnt/raid-modelos, built from two NVMe drives in USB enclosures.
-
On cold boot or normal hot-plug:
-
Devices negotiate correctly at USB 3.x (≥ 5000 Mbps)
-
RAID assembles and mounts without issues
-
-
After some reboots, intermittently:
-
The same devices enumerate as USB2 (480 Mbps)
-
Performance collapses
-
Assembling/mounting the RAID in this state is unsafe (timeouts, resets, corruption risk)
-
This looks like a USB enumeration / power / timing issue during boot, not thermal throttling and not an mdadm problem per se.
🛡️ Implemented solution (defensive)
I decided to never mount the RAID unless both disks negotiate at least USB3 speed.
High-level behavior:
-
The RAID is only assembled and mounted if all member devices report ≥ 5000 Mbps
-
If, after reboot, devices show up as USB2:
-
❌ RAID is not assembled
-
❌ Nothing is mounted
-
✅ System remains in a safe, non-corrupting state
-
Everything is automated via udev + systemd.
🔧 How it works (summary)
-
udev detects the USB devices by stable by-id / serial
-
udev triggers a systemd service
-
A small control script:
-
checks actual link speed via
/sys/.../speed -
if speed ≥ threshold:
-
mdadm --assemble -
mount
/mnt/raid-modelos
-
-
otherwise:
- leaves the array stopped and unmounted
-
🛠️ Daily commands
sudo raid-modelos status # RAID + USB speed + mount status
sudo raid-modelos ensure # safe mount (USB >= threshold)
sudo raid-modelos start # forced mount (not recommended)
sudo raid-modelos stop && sync
Quick diagnostics:
lsusb -t
cat /proc/mdstat
mdadm --detail /dev/md0
findmnt /mnt/raid-modelos
📁 Key files involved
-
Main control script:
/usr/local/sbin/raid-modelos -
Config (USB speed threshold, by-id, waits):
/etc/raid-modelos.conf -
udev rule:
/etc/udev/rules.d/99-raid-modelos.rules -
systemd service:
/etc/systemd/system/raid-modelos-ensure.service -
mdadm config:
/etc/mdadm/mdadm.conf -
fstab entry with
noauto
⚙️ Useful tuning knobs
MIN_USB_SPEED_MBPS=5000 # default
MIN_USB_SPEED_MBPS=0 # allow mount even at 480 Mbps (NOT recommended)
WAIT_PARTITIONS_SEC=...
❓ Open question
Has anyone found a better or more root-cause solution to prevent, after reboot:
-
USB-C / USB4 / Thunderbolt devices
-
from falling back to USB2 (480 Mbps)?
I’d be especially interested in experience with:
-
kernel / xHCI parameters
-
power-management quirks
-
USB enclosure firmware differences
-
USB4 vs TB3/TB4 behavior
-
cleaner ways to delay USB enumeration during boot
The current approach is safe and works well, but it’s clearly a defensive workaround rather than a true fix.
Any insight or alternative approach would be very welcome.