DGX Spark RAG on Docker

Hello,

I am sharing my setup for deploying a RAG stack on the DGX Spark.

The stack is composed of Anything LLM as orchestrator, Ollama as LLM provider (with the possibility of configuring other local or cloud providers), and Qdrant DB as vector DB. The web search ability is achieved using SearxNG meta engine.

The deployment is achieved using docker compose.

GitHub - amasu/dgx-spark-rag: Retrieval Augmented Generation containerized for Nvidia DGX Spark · GitHub

Cheers !

Sharing another RAG stack approach for the Spark: I got RAGFlow (v0.24.0) running natively on DGX Spark with GPU-accelerated document processing (DeepDoc / OCR / embeddings).

The main challenges were:

No upstream ARM64 Docker images (solved by patching the Dockerfile at build time)
No prebuilt onnxruntime-gpu wheel for SM_121 / CUDA 13 (solved by compiling from source inside a builder container - takes a while on first run, cached afterwards)

Upstream OCR only supports Chinese PP-OCRv5 (replaced with a three-model cascade: Latin primary, Cyrillic secondary, Chinese tertiary - highest confidence wins per text box)

find_codec() misdetects ISO-8859-1 as GBK, corrupting umlauts (fixed)

Published the full build + deploy automation as a standalone script package:

One-time project, tested on v0.24.0 only. No support, no roadmap - just sharing in case it saves someone else the debugging time.