DGX Spark (GB10, ARM64) – Embedding NIM llama-3.2-nv-embedqa-1b-v2:1.10.0 fails with cudaErrorSymbolNotFound (onnx runtime)

cjeske · December 17, 2025, 2:21pm

Hi,

I am running into a reproducible issue with the NVIDIA Embedding NIM on a DGX Spark system and would like to clarify whether this is a known compatibility problem with GB10 / ARM64 or a misconfiguration on my side.

Environment

Hardware: NVIDIA DGX Spark with GB10 (Grace Blackwell), 128 GB unified memory
CPU Arch: ARM64 / aarch64
OS: DGX Base OS (Ubuntu 22.04 based)
Driver / CUDA (from inside a CUDA container):
- nvidia-smi shows:
  - Driver Version: 580.95.05
  - CUDA Version: 13.0
Container Runtime: Docker with NVIDIA Container Toolkit
Other NIMs on the same system are working:
- LLM NIM: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:1.13.1 (running with a valid NIM_MODEL_PROFILE, GPU 0)
- Ranking NIM: nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:1.8.0 (healthy)
- NeMo Retriever page/graphic/table NIMs are also running fine
Vector DB:
- Milvus milvusdb/milvus:v2.6.2-gpu (Up and healthy)
- MinIO + etcd running without issues

So GPU access, NIM base stack, and Milvus are all functioning on DGX Spark.

Embedding NIM in use

Embedding service from the NVIDIA RAG blueprint:

Image:
nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:1.10.0
Exposed on host as:
http://localhost:9080/v1/embeddings

Request that fails

I am calling the Embedding NIM via Python (requests) from the DGX host (ARM64 venv):

import requests
import json

EMBEDDING_URL = “http://localhost:9080/v1/embeddings”
EMBEDDING_MODEL = “nvidia/llama-3.2-nv-embedqa-1b-v2”

payload = {
“model”: EMBEDDING_MODEL,
“input”: [“Dies ist ein kurzer Test für den Embedding-NIM.”],
“input_type”: “passage” # as required: ‘query’ or ‘passage’
}

resp = requests.post(EMBEDDING_URL, json=payload, timeout=60)
print(resp.status_code)
print(resp.text)

The payload is accepted syntactically (i.e., 4xx validation errors disappear once input_type is set to "passage" as required), but the service now returns an internal error.

Response

HTTP status: 500

Body:

{
“object”: “error”,
“message”: “Something went wrong with the request.”,
“detail”: “Unexpected error: onnx runtime error 1: Non-zero status code returned while running ReduceSum node. Name:‘/pooling_module/ReduceSum_1’ Status Message: CUDA error cudaErrorSymbolNotFound:named symbol not found”,
“type”: “internal_server_error”
}

So the request is valid, but the ONNX runtime inside the NIM container fails with:

cudaErrorSymbolNotFound: named symbol not found

This happens consistently for any non-trivial input; the container itself is running and reachable, just failing on actual inference.

What already works on the same system

Running nvcr.io/nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi works fine and shows the GB10 GPU.
LLM NIM (llama-3.3-nemotron-super-49b-v1.5:1.13.1) is able to load and run with a compatible model profile on GPU 0.
Ranking NIM works and responds correctly.

So GPU + CUDA stack + other NIMs are OK, the issue seems specific to this embedding model / build.

Questions

Is nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:1.10.0 officially tested/supported on DGX Spark (GB10, ARM64, driver 580.xx / CUDA 13)?
Is this cudaErrorSymbolNotFound a known issue with this particular embedding NIM build on GB10, and is there a recommended tag (e.g. newer production branch) that should be used instead on DGX Spark?
If a newer/multiarch build is already available or planned for this model (or an equivalent embedding model), could you point me to the recommended image/tag for DGX Spark?

Goal: I would like to use this embedding NIM (or an equivalent one) as the document/query encoder in an on-prem RAG setup built on DGX Spark where the LLM NIM and Milvus are already working.

Thanks in advance for any guidance or pointers.

NVES · December 17, 2025, 3:42pm

this was released prior to Spark. recommend reviewing the stack described here dgx-spark-playbooks/nvidia/multi-agent-chatbot at main · NVIDIA/dgx-spark-playbooks · GitHub

Topic		Replies	Views
NIM LLM Containers Fail on DGX Spark (GB10): Triton/vLLM Crash on sm_121 and NGC Permission Errors DGX Spark / GB10 jetson , nim , llama , nemotron	2	154	December 3, 2025
Missing official native ARM64 NIM images for essential AI models DGX Spark / GB10 nim , llama	4	222	December 17, 2025
Help running Nemotron 3 Nano 30B-A3B-FP8 on DGX Spark (GB10) DGX Spark / GB10 spark , nim , nemotron	32	961	December 20, 2025
None of the Biology NIM Containers Can Run on DGX Spark - Correct? DGX Spark / GB10 nim	2	194	October 30, 2025
CUDA fail start. Local NIM Containers run failed CUDA Setup and Installation nim , llama-31-405b-instruct , llama	2	293	September 20, 2024
NIM Llama3 8B Instruct - Running container with "CUDA_ERROR_NO_DEVICE" cuDNN docker , nim , llama3-8b-instruct	1	105	March 28, 2025
NIM does not support llama-3.1-8b-instruct and llama-3.1-70b-instruct on GH200 On-Prem deployment Models nim , llama-31-8b-instruct , llama	1	326	November 7, 2024
RAG Blueprint on DGX Spark (ARM64 / GB10): NIMs & Milvus OK, but ingestor-server / rag-server fail with exec format error DGX Spark / GB10 nim , llama , blueprints , nemotron	1	67	December 17, 2025
NVIDIA NIM Container with CUDA out of Memory Problem Docker and NVIDIA Docker cuda , ubuntu , docker , nim , llama3-8b-instruct	2	700	September 20, 2024
Launch the Reranker NIM : Failing to create container for Visual AI Agent nim , llama	9	266	May 23, 2025

DGX Spark (GB10, ARM64) – Embedding NIM llama-3.2-nv-embedqa-1b-v2:1.10.0 fails with cudaErrorSymbolNotFound (onnx runtime)

Related topics