NIM LLM Containers Fail on DGX Spark (GB10): Triton/vLLM Crash on sm_121 and NGC Permission Errors

angel.oropeza · December 2, 2025, 10:28pm

Hello,

I am experiencing two separate issues when attempting to deploy NVIDIA NIM LLM containers on a DGX Spark system with GB10 GPUs. I have collected all relevant technical information, logs, and will attach the nvidia-bug-report.log.gz file as required.

nvidia-bug-report.log.gz (594.4 KB)

System Information

System: DGX Spark
GPU: NVIDIA GB10 (Blackwell)
Architecture: aarch64
OS: Ubuntu 24.04 LTS
Driver: 580.95.05
CUDA: 13.0
NVIDIA Container Toolkit: 1.18.0
Docker: 24.x
Behavior: nvidia-smi works both on host and inside CUDA containers.

Validation commands:

nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi

Both run successfully.

I will attach the nvidia-bug-report.log.gz file generated by:

sudo nvidia-bug-report.sh

Problem #1: Llama 3.3 Nemotron Super 49B Fails on GB10 (Triton/LLVM crash)

Container:

nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest

Description

The model downloads, loads ~93 GiB into GPU memory, and then crashes during Triton/vLLM kernel compilation. The healthcheck never becomes ready.

Key Log Extract

'sm_121' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32
...
RuntimeError: Engine core initialization failed.

Reproduction Steps

Use the DGX Spark system with the environment described above.

Pull the container:

docker pull nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest

Launch with:

docker run --gpus all -p 8999:8000 nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest

Observe Triton/vLLM crash after weight loading.

Problem #2: Nemotron Nano 9B (DGX Spark variant) Fails With NGC Permission Error

Container:

nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest

Description

The container starts, detects the GPU, but fails immediately when downloading the model files from NGC.

Key Log Extract

Permission error: The requested operation requires permissions that the user does not have.
This may be due to the user not being a member of the organization that owns the repo.

The failing URL is:

https://api.ngc.nvidia.com/v2/org/nim/team/nvidia/models/nemotron-nano-9b-v2/hf-nvfp4-v1/files

This suggests that the NGC API key tied to my account may not have the required entitlements for the org/team hosting this model.

Reproduction Steps

Use the same DGX Spark system.

Pull the DGX-Spark NIM Nano 9B image:

docker pull nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest

Run the container with:

docker run --gpus all -p 8000:8000 -e NGC_API_KEY=$NGC_API_KEY nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest

Container terminates while attempting to download model manifests.

Additional Diagnostics

Docker Compose (minimal)

services:

  nim-llm:
    container_name: nim-llm-ms
    image: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest
    volumes:
    - ${MODEL_DIRECTORY:-/tmp}:/opt/nim/.cache
    user: "${USERID}"
    ports:
    - "8999:8000"
    expose:
    - "8000"
    environment:
      NGC_API_KEY: ${NGC_API_KEY}
    shm_size: 16gb
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: ${INFERENCE_GPU_COUNT:-all}
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8000/v1/health/ready')"]
      interval: 10s
      timeout: 20s
      retries: 10000

  nim-llm-nano:
    container_name: nim-llm-ms-nano
    image: nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest
    volumes:
    - ${MODEL_DIRECTORY:-/tmp}:/opt/nim/.cache
    user: "${USERID}"
    ports:
    - "8000:8000"
    expose:
    - "8000"
    environment:
      NGC_API_KEY: ${NGC_API_KEY}
    shm_size: 16gb
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: ${INFERENCE_GPU_COUNT:-all}
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8000/v1/health/ready')"]
      interval: 10s
      timeout: 20s
      retries: 10000

Cache directory

/opt/nim/.cache

Writable and confirmed with logs.

Healthchecks fail because container crashes before HTTP server starts.

Questions for NVIDIA Engineering

A) Regarding the 49B model (llama-3.3-nemotron-super-49b-v1.5)

Is sm_121 (GB10) fully supported by the Triton / vLLM versions bundled in this image?
Is there a newer NIM version required (e.g., 1.14.0 or later)?
Is this a known issue where LLVM/Triton do not yet support Blackwell kernels for this model?

B) Regarding the Nemotron Nano 9B DGX-Spark variant

What specific NGC entitlements or organization memberships are required to access:
```
org/nim/team/nvidia/models/nemotron-nano-9b-v2
```
Is this model gated behind NVIDIA AI Enterprise / NIM licensing?
Should DGX Spark customers automatically receive access to these DGX-Spark-tagged models?

C) Availability / Support Matrix

Is there an up-to-date compatibility matrix for NIM LLM containers running on DGX Spark (GB10)?
Are there alternative NIM LLM containers officially supported today for this architecture?

Please let me know if additional diagnostics or traces are needed. I am available to test updated containers or patches immediately.

Neurfer · December 3, 2025, 2:57am

@angel.oropeza I have same error message from trying to run llama-3.3-nemotron-super-49b-v1.5. But I can run nano-9b-v2 by itself like this which takes about 92GB memory all by itself.

But if you get both running at the same time somehow magically, do please post the solution please.

# Must clean the cache
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
sudo sysctl -w vm.drop_caches=3


docker run -it --rm     --gpus all --runtime=nvidia \
           --ulimit memlock=-1 --ulimit stack=67108864 \
           --shm-size=16GB -e NGC_API_KEY \
           -v "$LOCAL_NIM_CACHE:/opt/nim/.cache"    \
           -u $(id -u)     -p 8000:8000 \
           nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest

aniculescu · December 3, 2025, 5:03pm

llama-3.3-nemotron-super-49b-v1.5 NIM is not officially supported on Spark yet
I will investigate this issue
You can view the NIM Release Notes to see which models are supported: Release Notes for NVIDIA NIM for LLMs — NVIDIA NIM for Large Language Models (LLMs)

Topic		Replies	Views
Help running Nemotron 3 Nano 30B-A3B-FP8 on DGX Spark (GB10) DGX Spark / GB10 spark , nim , nemotron	32	961	December 20, 2025
DGX Spark (GB10, ARM64) – Embedding NIM llama-3.2-nv-embedqa-1b-v2:1.10.0 fails with cudaErrorSymbolNotFound (onnx runtime) DGX Spark / GB10 nim , llama , nemotron	1	45	December 17, 2025
None of the Biology NIM Containers Can Run on DGX Spark - Correct? DGX Spark / GB10 nim	2	194	October 30, 2025
Maximum model size to build TRT-LLM Engine on DGX Spark? DGX Spark / GB10 llama , nemotron	4	284	October 27, 2025
Issues while starting NIM container in A10 VM Models nim , llama3-8b-instruct	4	240	September 4, 2024
Missing official native ARM64 NIM images for essential AI models DGX Spark / GB10 nim , llama	4	222	December 17, 2025
Launch the Reranker NIM : Failing to create container for Visual AI Agent nim , llama	9	266	May 23, 2025
Model Orchestration and Deployment DGX Spark / GB10 nim	4	249	November 24, 2025
NVIDIA NIM Container with CUDA out of Memory Problem Docker and NVIDIA Docker cuda , ubuntu , docker , nim , llama3-8b-instruct	2	700	September 20, 2024
Aunch NVIDIA NIM (llama3-8b-instruct) for LLMs locally Access/Accounts nim , llama3-8b-instruct	3	206	November 8, 2024

NIM LLM Containers Fail on DGX Spark (GB10): Triton/vLLM Crash on sm_121 and NGC Permission Errors

System Information

Problem #1: Llama 3.3 Nemotron Super 49B Fails on GB10 (Triton/LLVM crash)

Description

Key Log Extract

Reproduction Steps

Problem #2: Nemotron Nano 9B (DGX Spark variant) Fails With NGC Permission Error

Description

Key Log Extract

Reproduction Steps

Additional Diagnostics

Docker Compose (minimal)

Cache directory

Healthchecks fail because container crashes before HTTP server starts.

Questions for NVIDIA Engineering

A) Regarding the 49B model (llama-3.3-nemotron-super-49b-v1.5)

B) Regarding the Nemotron Nano 9B DGX-Spark variant

C) Availability / Support Matrix

Related topics