Running nvidia/Nemotron-Nano-VL-12B-V2-NVFP4-QAD on your spark

raphael.amorim · November 5, 2025, 11:14pm

Hello,

Based on conversations information found on this channel (thanks to @eugr @johnny_nv) and the internet, I’ve consolidated a playbook to run Nemotron Nano VL 12B V2 at NVFP4 quantization on your DGX spark using vLLM:

Feel free to test it and contribute. I’ll add more later this week.

MackenzieNVIDIA · November 5, 2025, 11:39pm

Thank you for sharing @raphael.amorim !

I have moved this topic to DGX Spark / GB10 User Forum > DGX Spark / GB10 Projects

avrami · November 7, 2025, 8:47pm

noice!

It also seems to work for nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD that came out today https://arxiv.org/pdf/2511.03929

ETA:

it looks like nvidia/Nemotron-Nano-VL-12B-V2-FP4-QAD redirects to nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD – why use the former vs the latter? Also, hugging face cache does not seem to know they are the same, so now I have two copies under different names.

models--nvidia--Nemotron-Nano-VL-12B-V2-FP4-QAD
models--nvidia--NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD

raphael.amorim · November 24, 2025, 5:40pm

I’ve actually optimized load time by 7x by using fastsafetensors library in the latest commit

system · December 11, 2025, 8:28am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DGX Spark, Nemotron3, and NVFP4: Getting to 65+ tps DGX Spark / GB10 spark , nemotron , dgx	14	1885	December 22, 2025
Testing Nemotron 3 Nano Models on Nvidia DGX Spark/Jetson Thor with vLLM and FlashInfer DGX Spark / GB10 jetson , nemotron	3	476	February 15, 2026
New nvcr.io/nvidia/vllm:26.03.post1-py3 loads Nemotron-3-Super-120B-A12B-NVFP4 DGX Spark / GB10 nemotron	0	209	April 17, 2026
NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 DGX Spark / GB10 nemotron	89	8674	March 31, 2026
Testing NVIDIA-Nemotron-3-Nano-4B- Model on Nvidia DGX Spark/Jetson Thor/6000 Pro with vLLM DGX Spark / GB10 jetson , nemotron	1	198	March 22, 2026
Help running Nemotron 3 Nano 30B-A3B-FP8 on DGX Spark (GB10) DGX Spark / GB10 spark , nim , nemotron	42	3126	February 7, 2026
Nemotron-3-Nano-30B-A3B-NVFP4 ultra-efficient NVFP4 precision version of Nemotron 3 Nano DGX Spark / GB10 jetson , nemotron	84	2943	March 20, 2026
VLM finetuning playbook - Error 404 DGX Spark / GB10	7	206	January 28, 2026
Running Nemotron 3 Super 120B on DGX Spark GB10— 72 hours continuous, 19 tok/s NVIDIA Nemotron llama , nemotron	3	159	March 28, 2026
New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 DGX Spark / GB10 Projects	35	2889	December 31, 2025

Running nvidia/Nemotron-Nano-VL-12B-V2-NVFP4-QAD on your spark

Related topics