SSD Expert Streaming

paulsc.liu · March 23, 2026, 2:11pm

This project enable MacBook Pro with 48GB RAM to run Qwen3.5-397B-A17B at 4.4+ tokens/second

M3 Max can do 17.5 GB/s. Is internal SSD throughput on DGX Spark fast enough to implement this approach?

I believe DGX Spark External USB is 20GB/s but through put needs to be tested.

Update 1: PCIe 5.0 x4 allows for a theoretical bandwidth of up to 16 GB/s, with practical drive performance pushing over 14 GB/s so I think this approach might be worth a try.

FlossingEnthusiast · March 23, 2026, 3:47pm

I believe DGX Spark External USB is 20GB/s but through put needs to be tested.

The external USB ports on the DGX Spark are 20Gbit/sec, not 20GByte/sec. Best-case scenario, you’ll get around ~2GB/sec for large sequential reads from an external device.
It’s been tested multiple times, and there’s a few posts here in the forum about external USB drive speeds.

Update 1: PCIe 5.0 x4 allows for a theoretical bandwidth of up to 16 GB/s, with practical drive performance pushing over 14 GB/s so I think this approach might be worth a try.

It’s certainly worth a try, but chances are you won’t see anywhere near 14GB/sec from the internal NVMe drive.
Here are results of a quick test on my FE DGX Spark with the Samsung 4TB drive:

dgx-spark:~$ sudo hdparm -tT --direct /dev/nvme0n1

/dev/nvme0n1:
Timing O_DIRECT cached reads:   21198 MB in  2.00 seconds = 10613.40 MB/sec
Timing O_DIRECT disk reads: 19930 MB in  3.00 seconds = 6643.07 MB/sec

And here are the results for the same test from a HP ZGX Nano G1N with a 4TB Corsair MP700 Micro:

zgx-spark:~$  sudo hdparm -tT --direct /dev/nvme0n1

/dev/nvme0n1:
Timing O_DIRECT cached reads:   4840 MB in  2.00 seconds = 2420.44 MB/sec
Timing O_DIRECT disk reads: 10552 MB in  3.00 seconds = 3517.18 MB/sec

raphael.amorim · March 23, 2026, 5:15pm

Just keep an eye of RAM utlilization while loading the experts, so you don’t get into a situation where you’re using Linux Swap, the heavy writing could accelerate wear of your NVMe drive

paulsc.liu · March 23, 2026, 5:32pm

Thanks for the feedback. I will do further investigation

paulsc.liu · March 23, 2026, 5:36pm

They are using that project to run 400 B LLM on iphone : Anemll (@anemll): "Running 400B model on iPhone! 0.6 t/s Credit @danveloper @alexintosh @danpacary @anemll" | XCancel

It will be interesting to see if it can be duplicated in DGX Spark, convert the code base to CUDA will need some work.

Topic		Replies	Views
DGX Spark PyTorch LLM training throughput up to 8x slower than expected DGX Spark / GB10	2	447	April 2, 2026
TensorRT-LLM + nvidia/Llama-3.3-70B-Instruct-NVFP4 = 5 tok/s DGX Spark / GB10 llama	4	560	February 1, 2026
Why 273 GB/s? Less Is More, Until It Isn’t DGX Spark / GB10	67	2070	March 27, 2026
DGX Spark performance DGX Spark / GB10	50	3742	February 27, 2026
6x Spark setup DGX Spark / GB10	109	7401	April 1, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	3676	March 6, 2026
The DDR bandwidth is significantly lower than the claimed 273GB/s DGX Spark / GB10	3	195	March 12, 2026
NVIDIA folks -- where is this promised nvfp4 speedup? DGX Spark / GB10	27	2354	March 26, 2026
Investigating Performance Issue/Bottleneck DGX Spark / GB10 llama , agentic-ai	9	567	February 1, 2026
Vibe Coding with NVIDIA DGX Spark DGX Spark / GB10	23	3632	January 25, 2026

SSD Expert Streaming

Related topics