Running nvidia/Nemotron-Nano-VL-12B-V2-NVFP4-QAD on your spark

I’ve actually optimized load time by 7x by using fastsafetensors library in the latest commit

2 Likes