RAG Blueprint on DGX Spark (ARM64 / GB10): NIMs & Milvus OK, but ingestor-server / rag-server fail with exec format error

Hi everyone,

I’m testing the NVIDIA RAG Blueprint on a DGX Spark (GB10, ARM64) and ran into a consistent architecture issue with the “control plane” containers. The NIMs and Milvus run fine, but the ingestor and RAG server containers exit with exec format error, which looks like an x86_64 vs ARM64 problem.

Environment

  • Hardware: NVIDIA DGX Spark (GB10 Grace Blackwell Superchip, ARM64, 128 GB unified memory)

  • Host OS: DGX OS / Ubuntu 22.04 aarch64

  • Driver:

    • nvidia-smi inside nvcr.io/nvidia/cuda:12.4.0-base-ubuntu22.04 reports:

      • Driver Version: 580.95.05

      • CUDA Version: 13.0

      • GPU: NVIDIA GB10

  • Docker: using NVIDIA Container Toolkit; docker run --rm --gpus all nvcr.io/nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi works as expected.

  • RAG Blueprint: NVIDIA-AI-Blueprints/rag (Compose files under deploy/compose)

What works

Using the provided Compose files, the following containers work correctly on DGX Spark:

  • NIMs (ARM64 / multiarch):

    • nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:1.13.1 as nim-llm-ms

      • NIM_MODEL_PROFILE set to the detected compatible profile:
        d491076c9c5fbbc0f5ec92916ee84050c3b51a1c93055a64de8d0f31a22ed209
    • nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:1.10.0 as nemoretriever-embedding-ms

    • nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:1.8.0 as nemoretriever-ranking-ms

    • nvcr.io/nim/nvidia/nemoretriever-page-elements-v2:1.5.0 as page-elements

    • nvcr.io/nim/nvidia/nemoretriever-graphic-elements-v1:1.5.0 as graphic-elements

    • nvcr.io/nim/nvidia/nemoretriever-table-structure-v1:1.5.0 as table-structure

All of these start and stay up; the embedding and rerank NIMs are healthy, the LLM-NIM loads successfully with the selected profile on GPU 0.

  • Vector DB stack:

    • milvusdb/milvus:v2.6.2-gpu as milvus-standalone

    • minio/minio:RELEASE.2025-09-07T16-13-09Z as milvus-minio

    • quay.io/coreos/etcd:v3.6.5 as milvus-etcd

Milvus is reachable on 19530, MinIO and etcd are healthy.

  • Redis:

    • redis/redis-stack:7.2.0-v18 is up and accepting connections.

So the heavy GPU services (NIMs) and the DB/storage layer are working fine on ARM64.

What fails

The following containers exit immediately on DGX Spark with exec format error:

  • nvcr.io/nvidia/blueprint/ingestor-server:2.3.0 (service ingestor-server, port 8082)

  • nvcr.io/nvidia/nemo-microservices/nv-ingest:25.9.0 (service nv-ingest-ms-runtime)

  • nvcr.io/nvidia/blueprint/rag-server:2.3.0 (service rag-server, port 8081)

  • nvcr.io/nvidia/blueprint/rag-frontend:2.3.0 (service rag-frontend, port 3000)

The NIMs and Milvus are running and reachable, but since ingestor-server and rag-server exit immediately with exec format error, ports 8081 and 8082 are not listening and the Blueprint’s REST endpoints/UI are not usable.

Given the error pattern, it looks like:

  • These Blueprint containers (ingestor-server, nv-ingest, rag-server, rag-frontend) are built for linux/amd64 only,

  • and are being run on an aarch64 host (DGX Spark / GB10), which results in the exec format error as soon as Docker tries to run the binary.

My question(s)

  1. Are there ARM64-native (or multi-arch) builds planned for:

    • nvcr.io/nvidia/blueprint/ingestor-server

    • nvcr.io/nvidia/nemo-microservices/nv-ingest

    • nvcr.io/nvidia/blueprint/rag-server

    • nvcr.io/nvidia/blueprint/rag-frontend

    specifically for DGX Spark / GB10?

  2. If yes, is there any ETA or roadmap you can share (e.g. target RAG Blueprint version or NIM/Blueprint release where ARM64 support for these containers will be available)?

  3. In the meantime, is there an officially recommended workaround for DGX Spark:

    • e.g. running only NIMs + Milvus on DGX Spark and hosting rag-server / ingestor-server on a separate x86_64 node,

    • or using a different supported stack for ARM64 to orchestrate the NIMs for RAG?

  4. Is there any ongoing work or branch (GitHub / internal) that explicitly targets RAG Blueprint on DGX Spark / ARM64, which early adopters can follow/test?

My main goal is to run the full NVIDIA RAG Blueprint (including ingest and RAG server) on DGX Spark, since the hardware is clearly capable and the NIMs already run well on this system. At the moment, the only blockers are the exec format error on the ARM64 host for these Python-based Blueprint containers.

Any guidance, roadmap hints, or pointers to ARM64-ready images (even experimental) would be very helpful.

Thanks in advance!

Hi,
Have you tried this repo yet?