DGX Spark vs AMD Strix Halo

eugr · October 23, 2025, 4:58pm

I know that many people are cross-shopping DGX Spark and AMD Strix Halo systems (Framework Desktop, etc) for a low power solution that can do some AI/LLM stuff.

There are a lot of reviews on the Web and YouTube, but most people making those don’t work with AI and specifically LLMs for a living, so we see them doing silly things like testing with Ollama (especially on AMD device!).

Since I’ve got both of these, I figured that it would be useful to share my initial impression from both. I’ve my Strix Halo system (GMKTek Evo x2 128GB) for about a week and DGX Spark just for a couple of days, so it’s definitely work in progress.

I made a post on Reddit, but figured that cross-posting here may help some folks.

Hardware

DGX Spark is probably the most minimalist mini-PC I’ve ever used.

It has absolutely no LEDs, not even in the LAN port, and on/off switch is a button, so unless you ping it over the network or hook up a display, good luck guessing if this thing is on.
All ports are in the back, there is no Display Port, only a single HDMI port, USB-C (power only), 3x USB-C 3.2 gen 2 ports, 10G ethernet port and 2x QSFP ports.

The air intake is in the front and exhaust is in the back. It is quiet for the most part, but the fan is quite audible when it’s on (but quieter than my GMKTek).

It has a single 4TB PciE 5.0x4 M.2 2242 SSD - SAMSUNG MZALC4T0HBL1-00B07 which I couldn’t find anywhere for sale in 2242 form factor, only 2280 version, but DGX Spark only takes 2242 drives. I wish they went with standard 2280 - weird decision, given that it’s a mini-PC, not a laptop or tablet. Who cares if the motherboard is an inch longer!

The performance seems good, and gives me 4240.64 MB/sec vs 3118.53 MB/sec on my GMKTek (as measured by hdparm).

It is user replaceable, but there is only one slot, accessible from the bottom of the device. You need to take the magnetic plate off and there are some access screws underneath.

The unit is made of metal, and gets quite hot during high loads, but not unbearable hot like some reviews mentioned. Cools down quickly, though (metal!).

The CPU is 20 core ARM with 10 performance and 10 efficiency cores. I didn’t benchmark them, but other reviews CPU show performance similar to Strix Halo.

Initial Setup

DGX Spark comes with DGX OS pre-installed (more on this later). You can set it up interactively using keyboard/mouse/display or in headless mode via WiFi hotspot that it creates.

I tried to set it up by connecting my trusted Logitech keyboard/trackpad combo that I use to set up pretty much all my server boxes, but once it booted up, it displayed “Connect the keyboard” message and didn’t let me proceed any further. Trackpad portion worked, and volume keys on the keyboard also worked! I rebooted, and was able to enter BIOS (by pressing Esc) just fine, and the keyboard was fully functioning there!

BTW, it has AMI BIOS, but doesn’t expose anything interesting other than networking and boot options.

Booting into DGX OS resulted in the same problem. After some googling, I figured that it shipped with a borked kernel that broke Logitech unified setups, so I decided to proceed in a headless mode.

Connected to the Wifi hotspot from my Mac (hotspot SSID/password are printed on a sticker on top of the quick start guide) and was able to continue set up there, which was pretty smooth, other than Mac spamming me with “connect to internet” popup every minute or so. It then proceeded to update firmware and OS packages, which took about 30 minutes, but eventually finished, and after that my Logitech keyboard worked just fine.

Linux Experience

DGX Spark runs DGX OS 7.2.3 which is based on Ubuntu 24.04.3 LTS, but uses NVidia’s custom kernel, and an older one than mainline Ubuntu LTS uses.
So instead of 6.14.x you get 6.11.0-1016-nvidia.

It comes with CUDA 13.0 development kit and NVidia drivers (580.95.05) pre-installed.
It also has NVidia’s container toolkit that includes docker, and GPU passthrough works well.

Other than that, it’s a standard Ubuntu Desktop installation, with GNOME and everything.

SSHd is enabled by default, so after headless install you can connect to it immediately without any extra configuration.

RDP remote desktop doesn’t work currently - it connects, but display output is broken.

I tried to boot from Fedora 43 Beta Live USB, and it worked, sort of. First, you need to disable Secure Boot in BIOS. Then, it boots only in “basic graphics mode”, because built-in nvidia drivers don’t recognize the chipset. It also throws other errors complaining about chipset, processor cores, etc.

I think I’ll try to install it to an external SSD and see if NVidia standard drivers will recognize the chip. There is hope:

 ==============
 PLATFORM INFO:
 ==============
 IOMMU: Pass-through or enabled
 Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
 Cuda Driver Version Installed:  13000
 Platform: NVIDIA_DGX_Spark, Arch: aarch64(Linux 6.11.0-1016-nvidia)
 Platform verification succeeded

As for Strix Halo, it’s an x86 PC, so you can run any distro you want. I chose Fedora 43 Beta, currently running with kernel 6.17.3-300.fc43.x86_64. Smooth sailing, up-to-date packages.

Llama.cpp experience

DGX Spark

You need to build it from source as there is no CUDA ARM build, but compiling llama.cpp was very straightforward - CUDA toolkit is already installed, just need to install development tools and it compiles just like on any other system with NVidia GPU. Just follow the instructions, no surprises.

However, when I ran the benchmarks, I ran into two issues.

The model loading was VERY slow. It took 1 minute 40 seconds to load gpt-oss-120b. For comparison, it takes 22 seconds to load on Strix Halo (both from cold, memory cache flushed).
I wasn’t getting the same results as ggerganov in this thread. While PP was pretty impressive for such a small system, TG was matching or even slightly worse than my Strix Halo setup with ROCm.

For instance, here are my Strix Halo numbers, compiled with ROCm 7.10.0a20251017, llama.cpp build 03792ad9 (6816), HIP only, no rocWMMA:

build/bin/llama-bench -m ~/.cache/llama.cpp/ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 -d 0,4096,8192,16384,32768 -p 2048 -n 32 -ub 2048

model	size	params	backend	test	t/s
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	pp2048	999.59 ± 4.31
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	tg32	47.49 ± 0.01
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	pp2048 @ d4096	824.37 ± 1.16
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	tg32 @ d4096	44.23 ± 0.01
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	pp2048 @ d8192	703.42 ± 1.54
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	tg32 @ d8192	42.52 ± 0.04
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	pp2048 @ d16384	514.89 ± 3.86
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	tg32 @ d16384	39.71 ± 0.01
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	pp2048 @ d32768	348.59 ± 2.11
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	tg32 @ d32768	35.39 ± 0.01

The same command on Spark gave me this:

model	size	params	backend	test	t/s
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	pp2048	1816.00 ± 11.21
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	tg32	44.74 ± 0.99
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	pp2048 @ d4096	1763.75 ± 6.43
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	tg32 @ d4096	42.69 ± 0.93
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	pp2048 @ d8192	1695.29 ± 11.56
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	tg32 @ d8192	40.91 ± 0.35
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	pp2048 @ d16384	1512.65 ± 6.35
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	tg32 @ d16384	38.61 ± 0.03
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	pp2048 @ d32768	1250.55 ± 5.21
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	tg32 @ d32768	34.66 ± 0.02

I tried enabling Unified Memory switch (GGML_CUDA_ENABLE_UNIFIED_MEMORY=1) - it improved model loading, but resulted in even worse performance.

I reached out to ggerganov, and he suggested disabling mmap. I thought I tried it, but apparently not.
Well, that fixed it. Model loading improved too - now taking 56 seconds from cold and 23 seconds when it’s still in cache.

Updated numbers:

model	size	params	backend	test	t/s
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	pp2048	1939.32 ± 4.03
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	tg32	56.33 ± 0.26
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	pp2048 @ d4096	1832.04 ± 5.58
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	tg32 @ d4096	52.63 ± 0.12
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	pp2048 @ d8192	1738.07 ± 5.93
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	tg32 @ d8192	48.60 ± 0.20
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	pp2048 @ d16384	1525.71 ± 12.34
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	tg32 @ d16384	45.01 ± 0.09
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	pp2048 @ d32768	1242.35 ± 5.64
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	CUDA	tg32 @ d32768	39.10 ± 0.09

As you can see, much better performance both in PP and TG.

As for Strix Halo, mmap/no-mmap doesn’t make any difference there.

Strix Halo

On Strix Halo, llama.cpp experience is… well, a bit turbulent.

You can download a pre-built version for Vulkan, and it works, but the performance is a mixed bag. TG is pretty good, but PP is not great.

build/bin/llama-bench -m ~/.cache/llama.cpp/ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 -d 0,4096,8192,16384,32768 -p 2048 -n 32 --mmap 0 -ngl 999 -ub 1024

NOTE: Vulkan likes batch size of 1024 the most, unlike ROCm that likes 2048 better.

model	size	params	backend	test	t/s
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	Vulkan	pp2048	526.54 ± 4.90
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	Vulkan	tg32	52.64 ± 0.08
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	Vulkan	pp2048 @ d4096	438.85 ± 0.76
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	Vulkan	tg32 @ d4096	48.21 ± 0.03
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	Vulkan	pp2048 @ d8192	356.28 ± 4.47
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	Vulkan	tg32 @ d8192	45.90 ± 0.23
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	Vulkan	pp2048 @ d16384	210.17 ± 2.53
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	Vulkan	tg32 @ d16384	42.64 ± 0.07
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	Vulkan	pp2048 @ d32768	138.79 ± 9.47
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	Vulkan	tg32 @ d32768	36.18 ± 0.02

I tried toolboxes from kyuz0, and some of them were better, but I still felt that I could squeeze more juice out of it. All of them suffered from significant performance degradation when the context was filling up.

Then I tried to compile my own using the latest ROCm build from TheRock (on that date).

I also build rocWMMA as recommended by kyoz0 (more on that later).

Llama.cpp compiled without major issues - I had to configure the paths properly, but other than that, it just worked.
The PP increased dramatically, but TG decreased.

model	size	params	backend	ngl	n_ubatch	fa	test	t/s
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	999	2048	1	pp2048	1030.71 ± 2.26
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	999	2048	1	tg32	47.84 ± 0.02
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	999	2048	1	pp2048 @ d4096	802.36 ± 6.96
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	999	2048	1	tg32 @ d4096	39.09 ± 0.01
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	999	2048	1	pp2048 @ d8192	615.27 ± 2.18
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	999	2048	1	tg32 @ d8192	33.34 ± 0.05
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	999	2048	1	pp2048 @ d16384	409.25 ± 0.67
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	999	2048	1	tg32 @ d16384	25.86 ± 0.01
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	999	2048	1	pp2048 @ d32768	228.04 ± 0.44
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	999	2048	1	tg32 @ d32768	18.07 ± 0.03

But the biggest issue is significant performance degradation with long context, much more than you’d expect.

Then I stumbled upon Lemonade SDK and their pre-built llama.cpp. Ran that one, and got much better results across the board. TG was still below Vulkan, but PP was decent and degradation wasn’t as bad:

model	size	params	test	t/s
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	pp2048	999.20 ± 3.44
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	tg32	47.53 ± 0.01
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	pp2048 @ d4096	826.63 ± 9.09
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	tg32 @ d4096	44.24 ± 0.03
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	pp2048 @ d8192	702.66 ± 2.15
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	tg32 @ d8192	42.56 ± 0.03
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	pp2048 @ d16384	505.85 ± 1.33
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	tg32 @ d16384	39.82 ± 0.03
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	pp2048 @ d32768	343.06 ± 2.07
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	tg32 @ d32768	35.50 ± 0.02

So I looked at their compilation options and noticed that they build without rocWMMA. So, I did the same and got similar performance too!

model	size	params	backend	test	t/s
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	pp2048	1000.93 ± 1.23
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	tg32	47.46 ± 0.02
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	pp2048 @ d4096	827.34 ± 1.99
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	tg32 @ d4096	44.20 ± 0.01
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	pp2048 @ d8192	701.68 ± 2.36
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	tg32 @ d8192	42.39 ± 0.04
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	pp2048 @ d16384	503.49 ± 0.90
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	tg32 @ d16384	39.61 ± 0.02
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	pp2048 @ d32768	344.36 ± 0.80
gpt-oss 120B MXFP4 MoE	59.02 GiB	116.83 B	ROCm	tg32 @ d32768	35.32 ± 0.01

So far that’s the best I could get from Strix Halo. It’s very usable for text generation tasks.

Also, wanted to touch multi-modal performance. That’s where Spark shines. I don’t have any specific benchmarks yet, but image processing is much faster on Spark than on Strix Halo, especially in vLLM.

VLLM Experience

Haven’t had a chance to do extensive testing here, but wanted to share some early thoughts.

DGX Spark

First, I tried to just build vLLM from the source as usual. The build was successful, but it failed with the following error: ptxas fatal : Value ‘sm_121a’ is not defined for option ‘gpu-name’

I decided not to spend too much time on this for now, and just launched vLLM container that NVidia provides through their Docker repository.
It is built for DGX Spark, so supports it out of the box.

However, it has version 0.10.1, so I wasn’t able to run Qwen3-VL there.

Now, they put the source code inside the container, but it wasn’t a git repository - probably contains some NVidia-specific patches - I’ll need to see if those could be merged into main vllm code.

So I just checked out vllm main branch and proceeded to build with existing pytorch as usual. This time I was able to run it and launch qwen3-vl models just fine.
Both dense and MOE work. I tried FP4 and AWQ quants - everything works, no need to disable CUDA graphs.

The performance is decent - I still need to run some benchmarks, but image processing is very fast.

Strix Halo

Unlike llama.cpp that just works, vLLM experience on Strix Halo is much more limited.

My goal was to run Qwen3-VL models that are not supported by llama.cpp yet, so I needed to build 0.11.0 or later. There are some existing containers/toolboxes for earlier versions, but I couldn’t use them.

So, I installed ROCm pyTorch libraries from TheRock, some patches from kyoz0 toolboxes to avoid amdsmi package crash, ROCm FlashAttention and then just followed vLLM standard installation instructions with existing pyTorch.

I was able to run Qwen3VL dense models with decent (for dense models) speeds, although initialization takes quite some time until you reduce -max-num-seqs to 1 and set tp 1.
The image processing is very slow though, much slower than llama.cpp for the same image, but the token generation is about what you’d expect from it.

Again, model loading is faster than Spark for some reason (I’d expect other way around given faster SSD in Spark and slightly faster memory).

I’m going to rebuild vLLM and re-test/benchmark later.

Some observations:

FP8 models don’t work - they hang on WARNING 10-22 12:55:04 [fp8_utils.py:785] Using default W8A8 Block FP8 kernel config. Performance might be sub-optimal! Config file not found at /home/eugr/vllm/vllm/vllm/model_executor/layers/quantization/utils/configs/N=6144,K=2560,device_name=Radeon_8060S_Graphics,dtype=fp8_w8a8,block_shape=[128,128].json
You need to use --enforce-eager, as CUDA graphs crash vLLM. Sometimes it works, but mostly crashes.
Even with --enforce-eager, there are some HIP-related crashes here and there occasionally.
AWQ models work, both 4-bit and 8-bit, but only dense ones. AWQ MOE quants require Marlin kernel that is not available for ROCm.

Conclusion / TL;DR

Summary of my initial impressions:

DGX Spark is an interesting beast for sure.
- Limited extensibility - no USB-4, only one M.2 slot, and it’s 2242.
- But has 200Gbps network interface.
It’s a first generation of such devices, so there are some annoying bugs and incompatibilities.
Inference wise, the token generation is nearly identical to Strix Halo both in llama.cpp and vllm, but prompt processing is 2-5x higher than Strix Halo.
- Strix Halo performance in prompt processing degrades much faster with context.
- Image processing takes longer, especially with vLLM.
- Model loading into unified RAM is slower on DGX Spark for some reason, both in llama.cpp and vLLM.
Even though vLLM included gfx1151 in the supported configurations, it still requires some hacks to compile it.
- And even then, the experience is suboptimal. Initialization time is slow, it crashes, FP8 doesn’t work, AWQ for MOE doesn’t work.
If you are an AI developer who uses transformers/pyTorch or you need vLLM - you are better off with DGX Spark (or just a normal GPU build).
If you want a power-efficient inference server that can run gpt-oss and similar MOE at decent speeds, and don’t need to process images often, Strix Halo is the way to go.
If you want a general purpose machine, Strix Halo wins too.

elsaco · October 23, 2025, 5:58pm

DGX Spark vs Strix Halo …that’s 🍎🍊. CUDA or bust!

eugr · October 23, 2025, 6:36pm

AMD has ROCm. Not a “real” CUDA, but PyTorch works.
Of course, CUDA is better supported, although GB10 and Blackwell in general is not exactly trouble-free yet.

pdeaudney · February 18, 2026, 12:41am

This post is indexed by google pretty highly for “dgx spark vs strix halo”, I was wondering does the conclusion still hold true after 3 months of software updates on the DGX spark ecosystem?

I am particularly interested in the local inference workload only use case.

Thanks

eugr · February 18, 2026, 12:50am

Still true, but at this point I’d recommend DGX Spark over Strix Halo unless the money is a concern:

vLLM support is still bad on Strix Halo, so you are pretty much limited to llama.cpp there.
Prompt processing speed is much higher on DGX Spark, especially if using vLLM. Like 5x higher, even more on longer contexts.
Model loading speed has been improved on Spark, unless you use MMAP - that one is still not great, but you can use --no-mmap with llama.cpp and --load-format fastsafetensors with vLLM.
200G networking is a MAJOR feature of Spark. Two Spark cluster can lead to almost 2x gains in inference speeds with dense models and lower, but still noticeable gains for MoE, and you can unlock larger models as a result. You are not limited to 2 Sparks either, some people here have 8x Spark clusters now.
Strix Halo machines increased in price, while OG Spark stays the same and OEM ones can be had for less money.

I still have both, but I use my dual Spark cluster for pretty much anything - Strix Halo machine performs some LLM stuff for my home automation pipelines now.

Topic		Replies	Views
DGX Spark performance DGX Spark / GB10	50	4926	February 27, 2026
New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 DGX Spark / GB10 Projects	35	3031	December 31, 2025
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	4286	March 6, 2026
HOW-TO: setup-dgx-spark docker inference - A "Sane" Inference Stack for GB10 (Need Contributors!) DGX Spark / GB10 Projects docker , llama , dgx	38	2238	April 28, 2026
Moving from Mac to NVIDIA: bought powerful hardware, but drowning in configs DGX Spark / GB10 llama , nemotron	37	2496	February 25, 2026
GDX Spark is extremely slow on a short LLM test DGX Spark / GB10 deepseek	21	3982	January 25, 2026
Spark-inference: Run 3 specialized models simultaneously on your DGX Spark — cybersecurity + coding + orchestration, 30-min setup DGX Spark / GB10 Projects jetson , llama , deepseek , nemotron	3	755	May 11, 2026
DGX Spark + RTX 3090 (any other GPU) --> DGX Spark Mini Station (DGX Sprak + (e)dGPU) DGX Spark / GB10 cuda , gaming , llama	6	879	April 14, 2026
DGX Spark: The Sovereign AI Stack — Dual-Model Architecture for Local Inference DGX Spark / GB10 Projects docker , spark , llm	9	1806	February 13, 2026
Investigating Performance Issue/Bottleneck DGX Spark / GB10 llama , agentic-ai	9	699	February 1, 2026

DGX Spark vs AMD Strix Halo

Hardware

Initial Setup

Linux Experience

Llama.cpp experience

DGX Spark

Strix Halo

VLLM Experience

DGX Spark

Strix Halo

Conclusion / TL;DR

Related topics