Hello,
I am experiencing two separate issues when attempting to deploy NVIDIA NIM LLM containers on a DGX Spark system with GB10 GPUs. I have collected all relevant technical information, logs, and will attach the nvidia-bug-report.log.gz file as required.
nvidia-bug-report.log.gz (594.4 KB)
System Information
-
System: DGX Spark
-
GPU: NVIDIA GB10 (Blackwell)
-
Architecture: aarch64
-
OS: Ubuntu 24.04 LTS
-
Driver: 580.95.05
-
CUDA: 13.0
-
NVIDIA Container Toolkit: 1.18.0
-
Docker: 24.x
-
Behavior:
nvidia-smiworks both on host and inside CUDA containers.
Validation commands:
nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi
Both run successfully.
I will attach the nvidia-bug-report.log.gz file generated by:
sudo nvidia-bug-report.sh
Problem #1: Llama 3.3 Nemotron Super 49B Fails on GB10 (Triton/LLVM crash)
Container:
nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest
Description
The model downloads, loads ~93 GiB into GPU memory, and then crashes during Triton/vLLM kernel compilation. The healthcheck never becomes ready.
Key Log Extract
'sm_121' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32
...
RuntimeError: Engine core initialization failed.
Reproduction Steps
-
Use the DGX Spark system with the environment described above.
-
Pull the container:
docker pull nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest -
Launch with:
docker run --gpus all -p 8999:8000 nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest -
Observe Triton/vLLM crash after weight loading.
Problem #2: Nemotron Nano 9B (DGX Spark variant) Fails With NGC Permission Error
Container:
nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest
Description
The container starts, detects the GPU, but fails immediately when downloading the model files from NGC.
Key Log Extract
Permission error: The requested operation requires permissions that the user does not have.
This may be due to the user not being a member of the organization that owns the repo.
The failing URL is:
https://api.ngc.nvidia.com/v2/org/nim/team/nvidia/models/nemotron-nano-9b-v2/hf-nvfp4-v1/files
This suggests that the NGC API key tied to my account may not have the required entitlements for the org/team hosting this model.
Reproduction Steps
-
Use the same DGX Spark system.
-
Pull the DGX-Spark NIM Nano 9B image:
docker pull nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest -
Run the container with:
docker run --gpus all -p 8000:8000 -e NGC_API_KEY=$NGC_API_KEY nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest -
Container terminates while attempting to download model manifests.
Additional Diagnostics
Docker Compose (minimal)
services:
nim-llm:
container_name: nim-llm-ms
image: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:latest
volumes:
- ${MODEL_DIRECTORY:-/tmp}:/opt/nim/.cache
user: "${USERID}"
ports:
- "8999:8000"
expose:
- "8000"
environment:
NGC_API_KEY: ${NGC_API_KEY}
shm_size: 16gb
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: ${INFERENCE_GPU_COUNT:-all}
capabilities: [gpu]
healthcheck:
test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8000/v1/health/ready')"]
interval: 10s
timeout: 20s
retries: 10000
nim-llm-nano:
container_name: nim-llm-ms-nano
image: nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:latest
volumes:
- ${MODEL_DIRECTORY:-/tmp}:/opt/nim/.cache
user: "${USERID}"
ports:
- "8000:8000"
expose:
- "8000"
environment:
NGC_API_KEY: ${NGC_API_KEY}
shm_size: 16gb
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: ${INFERENCE_GPU_COUNT:-all}
capabilities: [gpu]
healthcheck:
test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8000/v1/health/ready')"]
interval: 10s
timeout: 20s
retries: 10000
Cache directory
/opt/nim/.cache
Writable and confirmed with logs.
Healthchecks fail because container crashes before HTTP server starts.
Questions for NVIDIA Engineering
A) Regarding the 49B model (llama-3.3-nemotron-super-49b-v1.5)
-
Is sm_121 (GB10) fully supported by the Triton / vLLM versions bundled in this image?
-
Is there a newer NIM version required (e.g., 1.14.0 or later)?
-
Is this a known issue where LLVM/Triton do not yet support Blackwell kernels for this model?
B) Regarding the Nemotron Nano 9B DGX-Spark variant
-
What specific NGC entitlements or organization memberships are required to access:
org/nim/team/nvidia/models/nemotron-nano-9b-v2 -
Is this model gated behind NVIDIA AI Enterprise / NIM licensing?
-
Should DGX Spark customers automatically receive access to these DGX-Spark-tagged models?
C) Availability / Support Matrix
-
Is there an up-to-date compatibility matrix for NIM LLM containers running on DGX Spark (GB10)?
-
Are there alternative NIM LLM containers officially supported today for this architecture?
Please let me know if additional diagnostics or traces are needed. I am available to test updated containers or patches immediately.