NIM LLM Containers Fail on DGX Spark (GB10): Triton/vLLM Crash on sm_121 and NGC Permission Errors

Hello,

I am experiencing two separate issues when attempting to deploy NVIDIA NIM LLM containers on a DGX Spark system with GB10 GPUs. I have collected all relevant technical information, logs, and will attach the nvidia-bug-report.log.gz file as required.

nvidia-bug-report.log.gz (594.4 KB)

System Information

  • System: DGX Spark

  • GPU: NVIDIA GB10 (Blackwell)

  • Architecture: aarch64

  • OS: Ubuntu 24.04 LTS

  • Driver: 580.95.05

  • CUDA: 13.0

  • NVIDIA Container Toolkit: 1.18.0

  • Docker: 24.x

  • Behavior: nvidia-smi works both on host and inside CUDA containers.

Validation commands:

nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi

Both run successfully.

I will attach the nvidia-bug-report.log.gz file generated by:

sudo nvidia-bug-report.sh

Problem #1: Llama 3.3 Nemotron Super 49B Fails on GB10 (Triton/LLVM crash)

Container:

nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest

Description

The model downloads, loads ~93 GiB into GPU memory, and then crashes during Triton/vLLM kernel compilation. The healthcheck never becomes ready.

Key Log Extract

'sm_121' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32
...
RuntimeError: Engine core initialization failed.

Reproduction Steps

  1. Use the DGX Spark system with the environment described above.

  2. Pull the container:

    docker pull nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest
    
  3. Launch with:

    docker run --gpus all -p 8999:8000 nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest
    
  4. Observe Triton/vLLM crash after weight loading.


Problem #2: Nemotron Nano 9B (DGX Spark variant) Fails With NGC Permission Error

Container:

nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest

Description

The container starts, detects the GPU, but fails immediately when downloading the model files from NGC.

Key Log Extract

Permission error: The requested operation requires permissions that the user does not have.
This may be due to the user not being a member of the organization that owns the repo.

The failing URL is:

https://api.ngc.nvidia.com/v2/org/nim/team/nvidia/models/nemotron-nano-9b-v2/hf-nvfp4-v1/files

This suggests that the NGC API key tied to my account may not have the required entitlements for the org/team hosting this model.

Reproduction Steps

  1. Use the same DGX Spark system.

  2. Pull the DGX-Spark NIM Nano 9B image:

    docker pull nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest
    
  3. Run the container with:

    docker run --gpus all -p 8000:8000 -e NGC_API_KEY=$NGC_API_KEY nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest
    
  4. Container terminates while attempting to download model manifests.


Additional Diagnostics

Docker Compose (minimal)

services:

  nim-llm:
    container_name: nim-llm-ms
    image: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest
    volumes:
    - ${MODEL_DIRECTORY:-/tmp}:/opt/nim/.cache
    user: "${USERID}"
    ports:
    - "8999:8000"
    expose:
    - "8000"
    environment:
      NGC_API_KEY: ${NGC_API_KEY}
    shm_size: 16gb
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: ${INFERENCE_GPU_COUNT:-all}
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8000/v1/health/ready')"]
      interval: 10s
      timeout: 20s
      retries: 10000

  nim-llm-nano:
    container_name: nim-llm-ms-nano
    image: nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest
    volumes:
    - ${MODEL_DIRECTORY:-/tmp}:/opt/nim/.cache
    user: "${USERID}"
    ports:
    - "8000:8000"
    expose:
    - "8000"
    environment:
      NGC_API_KEY: ${NGC_API_KEY}
    shm_size: 16gb
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: ${INFERENCE_GPU_COUNT:-all}
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8000/v1/health/ready')"]
      interval: 10s
      timeout: 20s
      retries: 10000

Cache directory

/opt/nim/.cache

Writable and confirmed with logs.

Healthchecks fail because container crashes before HTTP server starts.


Questions for NVIDIA Engineering

A) Regarding the 49B model (llama-3.3-nemotron-super-49b-v1.5)

  1. Is sm_121 (GB10) fully supported by the Triton / vLLM versions bundled in this image?

  2. Is there a newer NIM version required (e.g., 1.14.0 or later)?

  3. Is this a known issue where LLVM/Triton do not yet support Blackwell kernels for this model?

B) Regarding the Nemotron Nano 9B DGX-Spark variant

  1. What specific NGC entitlements or organization memberships are required to access:

    org/nim/team/nvidia/models/nemotron-nano-9b-v2
    
  2. Is this model gated behind NVIDIA AI Enterprise / NIM licensing?

  3. Should DGX Spark customers automatically receive access to these DGX-Spark-tagged models?

C) Availability / Support Matrix

  1. Is there an up-to-date compatibility matrix for NIM LLM containers running on DGX Spark (GB10)?

  2. Are there alternative NIM LLM containers officially supported today for this architecture?

Please let me know if additional diagnostics or traces are needed. I am available to test updated containers or patches immediately.

1 Like

@angel.oropeza I have same error message from trying to run llama-3.3-nemotron-super-49b-v1.5. But I can run nano-9b-v2 by itself like this which takes about 92GB memory all by itself.

But if you get both running at the same time somehow magically, do please post the solution please.

# Must clean the cache
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
sudo sysctl -w vm.drop_caches=3


docker run -it --rm     --gpus all --runtime=nvidia \
           --ulimit memlock=-1 --ulimit stack=67108864 \
           --shm-size=16GB -e NGC_API_KEY \
           -v "$LOCAL_NIM_CACHE:/opt/nim/.cache"    \
           -u $(id -u)     -p 8000:8000 \
           nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest
  1. llama-3.3-nemotron-super-49b-v1.5 NIM is not officially supported on Spark yet
  2. I will investigate this issue
  3. You can view the NIM Release Notes to see which models are supported: Release Notes for NVIDIA NIM for LLMs — NVIDIA NIM for Large Language Models (LLMs)