I’ve actually optimized load time by 7x by using fastsafetensors library in the latest commit
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| DGX Spark, Nemotron3, and NVFP4: Getting to 65+ tps | 14 | 820 | December 22, 2025 | |
| Help running Nemotron 3 Nano 30B-A3B-FP8 on DGX Spark (GB10) | 41 | 2268 | January 24, 2026 | |
| VLM finetuning playbook - Error 404 | 6 | 149 | January 28, 2026 | |
| New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 | 35 | 1330 | December 31, 2025 | |
| Can we fine-tune fastpitch on DGX Spark using Nemo | 0 | 31 | January 21, 2026 | |
| NVFP4 quantization on the GP10 error | 3 | 292 | November 14, 2025 | |
| NVIDIA folks -- where is this promised nvfp4 speedup? | 24 | 1068 | January 11, 2026 | |
| Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) | 18 | 1362 | December 25, 2025 | |
| DGX Spark Playbooks Update - Jan 2026 | 1 | 567 | January 21, 2026 | |
| How to enable nvfp4 | 6 | 540 | November 6, 2025 |