|
Running GLM-4.7-FP8 (355B MoE) on 4x DGX Spark with SGLang + EAGLE Speculative Decoding
|
|
38
|
1413
|
April 10, 2026
|
|
Guide: Gemma 4 31B on DGX Spark via NemoClaw — Dual-Model Setup Guide
|
|
0
|
40
|
April 10, 2026
|
|
ONNX Runtime GPU inference on DGX Spark (GX10) — build guide and prebuilt binaries
|
|
0
|
29
|
April 10, 2026
|
|
RedHatAI/Qwen3.5-122B-A10B-NVFP4 seems to be the best option for a single Spark
|
|
73
|
3948
|
April 10, 2026
|
|
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB
|
|
44
|
7763
|
April 9, 2026
|
|
Spark and vllm
|
|
0
|
54
|
April 9, 2026
|
|
NemoClaw failed adding Telegram channel
|
|
2
|
81
|
April 9, 2026
|
|
New pre-built sglang Docker Images for NVIDIA DGX Spark
|
|
22
|
1447
|
April 9, 2026
|
|
DGX Spark Model Manager — Open Source Web UI for Ollama, SGLang & LiteLLM
|
|
5
|
354
|
April 9, 2026
|
|
OpenClaw + Ollama hybrid + ClawMobile architecture
|
|
6
|
140
|
April 8, 2026
|
|
Multilingual Speech-to-Text STT / ASR with Nvidia parakeet-tdt-0.6b-v3 for the DGX Spark
|
|
4
|
147
|
April 8, 2026
|
|
vLLM custom for DGX Spark - STREAM LOADING and automatic KV cache
|
|
10
|
341
|
April 8, 2026
|
|
NeuralForge GPU Native Knowledge Intelligence Platform Built on DGX Spark GB10
|
|
1
|
100
|
April 8, 2026
|
|
My DGX Spark Hangs ... is this normal?
|
|
2
|
94
|
April 7, 2026
|
|
DGX Spark GB10 / vLLM 0.19.1: TurboQuant KV cache integration results on Qwen3.5 and Nemotron, including gather-free Triton decode and CUDA WPH decode
|
|
5
|
760
|
April 7, 2026
|
|
Marlin Fix: NVFP4 Actually Works on SM121 (DGX Spark)
|
|
2
|
669
|
April 7, 2026
|
|
HOW-TO: setup-dgx-spark docker inference - A "Sane" Inference Stack for GB10 (Need Contributors!)
|
|
31
|
1363
|
April 7, 2026
|
|
Sparkrun - central command with tab completion for launching inference on Spark Clusters
|
|
60
|
1622
|
April 6, 2026
|
|
Qwen3.5-397B-A17B + DGX Spark (duo)
|
|
55
|
4231
|
April 6, 2026
|
|
Gemma4 Benchmarks on double DGX Sparks Ray Cluster and single DGX
|
|
2
|
425
|
April 6, 2026
|
|
Trinity-Large-Thinking should fit in 2 Sparks
|
|
3
|
308
|
April 6, 2026
|
|
Bf16 LoRA Fine-Tuning of Qwen3.5-35B-A3B on DGX Spark — No Quantization Required
|
|
5
|
505
|
April 6, 2026
|
|
Took the plunge and bought a 2nd Spark! More use cases?
|
|
7
|
669
|
April 4, 2026
|
|
Nv-monitor: Add RDMA/InfiniBand metrics (data export only) - testers needed
|
|
8
|
161
|
April 4, 2026
|
|
NVML Support for DGX Spark Grace Blackwell Unified Memory - Community Solution
|
|
7
|
548
|
April 4, 2026
|
|
NVidia GB10 Specific Model Guide / Containers
|
|
2
|
183
|
April 3, 2026
|
|
Multi-Node DGX Spark Cluster (4×) — K3s, SGLang/vLLM, ConnectX-7 SR-IOV, Full Benchmark Matrix
|
|
0
|
106
|
April 3, 2026
|
|
Introducing the Atlas Inference Server and Engine
|
|
146
|
5183
|
April 2, 2026
|
|
Implementation Guide: DGX Spark with Qwen3.5-35B-A3B via llama.cpp for Claude Code
|
|
3
|
655
|
April 2, 2026
|
|
DGX Spark: 13 → 49 tok/s with Qwen3.5-35B — Native SM121 Kernel Build Guide
|
|
13
|
907
|
April 1, 2026
|