DGX Spark USB ports are USB 4 (40Gbps) :O So why?

mdadm RAID over USB on DGX Spark: after reboot disks fall back to USB2 (480 Mbps) → defensive workaround + open question

I wanted to document a recurring issue after reboot with NVMe drives in USB enclosures on DGX Spark, and the practical workaround I ended up implementing to avoid data corruption — in case it helps others, or someone has a cleaner solution.


🧩 Observed problem

This system uses an mdadm RAID array (/dev/md0) mounted at /mnt/raid-modelos, built from two NVMe drives in USB enclosures.

  • On cold boot or normal hot-plug:

    • Devices negotiate correctly at USB 3.x (≥ 5000 Mbps)

    • RAID assembles and mounts without issues

  • After some reboots, intermittently:

    • The same devices enumerate as USB2 (480 Mbps)

    • Performance collapses

    • Assembling/mounting the RAID in this state is unsafe (timeouts, resets, corruption risk)

This looks like a USB enumeration / power / timing issue during boot, not thermal throttling and not an mdadm problem per se.


🛡️ Implemented solution (defensive)

I decided to never mount the RAID unless both disks negotiate at least USB3 speed.

High-level behavior:

  • The RAID is only assembled and mounted if all member devices report ≥ 5000 Mbps

  • If, after reboot, devices show up as USB2:

    • ❌ RAID is not assembled

    • ❌ Nothing is mounted

    • ✅ System remains in a safe, non-corrupting state

Everything is automated via udev + systemd.


🔧 How it works (summary)

  1. udev detects the USB devices by stable by-id / serial

  2. udev triggers a systemd service

  3. A small control script:

    • checks actual link speed via /sys/.../speed

    • if speed ≥ threshold:

      • mdadm --assemble

      • mount /mnt/raid-modelos

    • otherwise:

      • leaves the array stopped and unmounted

🛠️ Daily commands

sudo raid-modelos status   # RAID + USB speed + mount status
sudo raid-modelos ensure   # safe mount (USB >= threshold)
sudo raid-modelos start    # forced mount (not recommended)
sudo raid-modelos stop && sync

Quick diagnostics:

lsusb -t
cat /proc/mdstat
mdadm --detail /dev/md0
findmnt /mnt/raid-modelos


📁 Key files involved

  • Main control script:
    /usr/local/sbin/raid-modelos

  • Config (USB speed threshold, by-id, waits):
    /etc/raid-modelos.conf

  • udev rule:
    /etc/udev/rules.d/99-raid-modelos.rules

  • systemd service:
    /etc/systemd/system/raid-modelos-ensure.service

  • mdadm config:
    /etc/mdadm/mdadm.conf

  • fstab entry with noauto


⚙️ Useful tuning knobs

MIN_USB_SPEED_MBPS=5000   # default
MIN_USB_SPEED_MBPS=0      # allow mount even at 480 Mbps (NOT recommended)
WAIT_PARTITIONS_SEC=...


❓ Open question

Has anyone found a better or more root-cause solution to prevent, after reboot:

  • USB-C / USB4 / Thunderbolt devices

  • from falling back to USB2 (480 Mbps)?

I’d be especially interested in experience with:

  • kernel / xHCI parameters

  • power-management quirks

  • USB enclosure firmware differences

  • USB4 vs TB3/TB4 behavior

  • cleaner ways to delay USB enumeration during boot

The current approach is safe and works well, but it’s clearly a defensive workaround rather than a true fix.
Any insight or alternative approach would be very welcome.