NVIDIA Developer Forums

New Playbook: Speculative Decoding on Two Sparks

Accelerated Computing DGX Spark / GB10 User Forum Announcements

aniculescu April 20, 2026, 4:54pm 1

We have just published a playbook on how to run Speculative Decoding using Two Spark systems for bigger models. You can check it out here: Speculative Decoding | DGX Spark

1 Like

Topic		Replies	Views	Activity
Speculative speculative decoding DGX Spark / GB10 llama	3	351	April 6, 2026
Boosting LLM Inference Speed Using Speculative Decoding in MLC-LLM on Nvidia Jetson AGX Orin Jetson Projects generative_ai , llama-31-8b-instruct , llama	0	284	November 23, 2024
TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x Technical Blog	4	209	January 9, 2025
Feasibility of 2-3x Speedup via Speculative Decoding on High-Compute (1000 TFLOPS) / Low-BW Hardware DGX Spark / GB10	1	138	November 28, 2025
Speculative decoding using vLLM on the Nvidia Jetson AGX Orin 64GB dev kit Jetson Projects generative_ai , llama-31-8b-instruct , llama	0	276	March 9, 2025
AI 추론 지연 시간을 줄이기 위한 Speculative Decoding 소개 Technical Blog - South Korea	1	82	September 23, 2025
DGX Spark Playbook Request - BF16 -> nvFP4 via model distillation DGX Spark / GB10 nemotron	1	133	January 29, 2026
DGX Spark New Playbooks - Nov 2025 Announcements	0	598	November 26, 2025
DGX Spark Playbooks Update - Jan 2026 Announcements data-science , spark , jetson , generative_ai , nemotron	1	917	January 21, 2026
Value of 2nd Spark? DGX Spark / GB10 Projects	21	1546	March 30, 2026