Running nvidia/Nemotron-Nano-VL-12B-V2-NVFP4-QAD on your spark

Hello,

Based on conversations information found on this channel (thanks to @eugr @johnny_nv) and the internet, I’ve consolidated a playbook to run Nemotron Nano VL 12B V2 at NVFP4 quantization on your DGX spark using vLLM:

Feel free to test it and contribute. I’ll add more later this week.

2 Likes

Thank you for sharing @raphael.amorim !

I have moved this topic to DGX Spark / GB10 User Forum > DGX Spark / GB10 Projects

1 Like

noice!

It also seems to work for nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD that came out today https://arxiv.org/pdf/2511.03929

ETA:

it looks like nvidia/Nemotron-Nano-VL-12B-V2-FP4-QAD redirects to nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD – why use the former vs the latter? Also, hugging face cache does not seem to know they are the same, so now I have two copies under different names.

models--nvidia--Nemotron-Nano-VL-12B-V2-FP4-QAD
models--nvidia--NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD
2 Likes

I’ve actually optimized load time by 7x by using fastsafetensors library in the latest commit

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.