DGX Spark (GB10, ARM64) – Embedding NIM llama-3.2-nv-embedqa-1b-v2:1.10.0 fails with cudaErrorSymbolNotFound (onnx runtime)

Hi,

I am running into a reproducible issue with the NVIDIA Embedding NIM on a DGX Spark system and would like to clarify whether this is a known compatibility problem with GB10 / ARM64 or a misconfiguration on my side.

Environment

  • Hardware: NVIDIA DGX Spark with GB10 (Grace Blackwell), 128 GB unified memory

  • CPU Arch: ARM64 / aarch64

  • OS: DGX Base OS (Ubuntu 22.04 based)

  • Driver / CUDA (from inside a CUDA container):

    • nvidia-smi shows:

      • Driver Version: 580.95.05

      • CUDA Version: 13.0

  • Container Runtime: Docker with NVIDIA Container Toolkit

  • Other NIMs on the same system are working:

    • LLM NIM: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:1.13.1 (running with a valid NIM_MODEL_PROFILE, GPU 0)

    • Ranking NIM: nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:1.8.0 (healthy)

    • NeMo Retriever page/graphic/table NIMs are also running fine

  • Vector DB:

    • Milvus milvusdb/milvus:v2.6.2-gpu (Up and healthy)

    • MinIO + etcd running without issues

So GPU access, NIM base stack, and Milvus are all functioning on DGX Spark.

Embedding NIM in use

Embedding service from the NVIDIA RAG blueprint:

  • Image:
    nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:1.10.0

  • Exposed on host as:
    http://localhost:9080/v1/embeddings

Request that fails

I am calling the Embedding NIM via Python (requests) from the DGX host (ARM64 venv):

import requests
import json

EMBEDDING_URL = “http://localhost:9080/v1/embeddings”
EMBEDDING_MODEL = “nvidia/llama-3.2-nv-embedqa-1b-v2”

payload = {
“model”: EMBEDDING_MODEL,
“input”: [“Dies ist ein kurzer Test für den Embedding-NIM.”],
“input_type”: “passage” # as required: ‘query’ or ‘passage’
}

resp = requests.post(EMBEDDING_URL, json=payload, timeout=60)
print(resp.status_code)
print(resp.text)

The payload is accepted syntactically (i.e., 4xx validation errors disappear once input_type is set to "passage" as required), but the service now returns an internal error.

Response

HTTP status: 500

Body:

{
“object”: “error”,
“message”: “Something went wrong with the request.”,
“detail”: “Unexpected error: onnx runtime error 1: Non-zero status code returned while running ReduceSum node. Name:‘/pooling_module/ReduceSum_1’ Status Message: CUDA error cudaErrorSymbolNotFound:named symbol not found”,
“type”: “internal_server_error”
}

So the request is valid, but the ONNX runtime inside the NIM container fails with:

cudaErrorSymbolNotFound: named symbol not found

This happens consistently for any non-trivial input; the container itself is running and reachable, just failing on actual inference.

What already works on the same system

  • Running nvcr.io/nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi works fine and shows the GB10 GPU.

  • LLM NIM (llama-3.3-nemotron-super-49b-v1.5:1.13.1) is able to load and run with a compatible model profile on GPU 0.

  • Ranking NIM works and responds correctly.

So GPU + CUDA stack + other NIMs are OK, the issue seems specific to this embedding model / build.

Questions

  1. Is nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:1.10.0 officially tested/supported on DGX Spark (GB10, ARM64, driver 580.xx / CUDA 13)?

  2. Is this cudaErrorSymbolNotFound a known issue with this particular embedding NIM build on GB10, and is there a recommended tag (e.g. newer production branch) that should be used instead on DGX Spark?

  3. If a newer/multiarch build is already available or planned for this model (or an equivalent embedding model), could you point me to the recommended image/tag for DGX Spark?

Goal: I would like to use this embedding NIM (or an equivalent one) as the document/query encoder in an on-prem RAG setup built on DGX Spark where the LLM NIM and Milvus are already working.

Thanks in advance for any guidance or pointers.

this was released prior to Spark. recommend reviewing the stack described here dgx-spark-playbooks/nvidia/multi-agent-chatbot at main · NVIDIA/dgx-spark-playbooks · GitHub