Deploying SLM/DSLM Workloads on DGX Spark and Possible Optimization Strategies

jeewoo19930315 · March 16, 2026, 8:05am

Hi everyone,

I am a researcher at the Research Center at University.

Our research focuses on Physical AI in extreme communication environments, particularly underwater and polar scenarios.

We are planning to utilize an NVIDIA DGX Spark system (GB10 Grace Blackwell Superchip with 128GB unified memory) to run Small Language Models (SLM) or Domain-Specific Language Models (DSLM) for tasks such as sensor data interpretation, channel modeling, and adaptive communication decision support.

I would appreciate guidance from the community regarding the following.

Deploying SLM/DSLM on the DGX Platform

What is the recommended approach for deploying and operating SLM/DSLM workloads on DGX Spark?

Specifically, I would like to understand

Which software stack is commonly used for SLM inference on this platform
(e.g., TensorRT-LLM, Triton Inference Server, vLLM, NeMo, or other frameworks)
Best practices for utilizing the 128GB unified memory architecture when running models with longer context windows or time-series sensor data
Whether there are recommended container-based pipelines (Docker / Kubernetes / NGC stacks) for running LLM/SLM workloads on DGX systems
Performance Optimization for SLM/DSLM Workloads

If additional performance optimization is required after deployment, what strategies are generally recommended on DGX Spark?

For example: Parameter-efficient fine-tuning approaches (e.g., LoRA / QLoRA) for adapting domain-specific models
Model compression techniques such as FP4 or INT8 quantization
GPU kernel or inference optimization through TensorRT-LLM or CUDA-based approaches

Any references to documentation, example projects, or relevant NVIDIA resources would be greatly appreciated.

Also, if there are any existing threads or official guides in this forum that cover similar topics, I would appreciate it if you could point me toward them.

Thank you for Read My Writes.

Topic		Replies	Views
Model Orchestration and Deployment DGX Spark / GB10 nim	4	728	November 24, 2025
Nvidia spark dgx GB10 fine-tune slow time problem - Urgent HELP DGX Spark / GB10 llama	5	164	February 26, 2026
DGX Spark performance DGX Spark / GB10	50	4475	February 27, 2026
Setting up vLLM, SGLang or TensorRT on two DGX Sparks DGX Spark / GB10	16	1769	December 7, 2025
Inference Stack Choice for Mixed Jetson Orin + DGX Spark Environment (Ollama vs vLLM?) DGX Spark / GB10	5	356	February 18, 2026
Running GLM-4.7-FP8 (355B MoE) on 4x DGX Spark with SGLang + EAGLE Speculative Decoding DGX Spark / GB10 Projects	39	1815	April 20, 2026
DGX Spark vs AMD Strix Halo DGX Spark / GB10 llama	4	6260	February 18, 2026
Investigating Performance Issue/Bottleneck DGX Spark / GB10 llama , agentic-ai	9	658	February 1, 2026
HOW-TO: setup-dgx-spark docker inference - A "Sane" Inference Stack for GB10 (Need Contributors!) DGX Spark / GB10 Projects docker , llama , dgx	38	2021	April 28, 2026
Best practices for running llvm bench DGX Spark / GB10	2	162	December 21, 2025

Deploying SLM/DSLM Workloads on DGX Spark and Possible Optimization Strategies

Related topics