Based on conversations information found on this channel (thanks to @eugr@johnny_nv) and the internet, I’ve consolidated a playbook to run Nemotron Nano VL 12B V2 at NVFP4 quantization on your DGX spark using vLLM:
Feel free to test it and contribute. I’ll add more later this week.
it looks like nvidia/Nemotron-Nano-VL-12B-V2-FP4-QAD redirects to nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD – why use the former vs the latter? Also, hugging face cache does not seem to know they are the same, so now I have two copies under different names.