hey NVIDIA
DOCUMENTATION ISSUE
Could you please fix names on in your models in vllm inference playbook.
i’ve tried to download all recommended models, hovewer I could not find them nor on HuggingFace neither on your Nvidia catalog.
Also please add some description where to find the model and where is official repository for recommended model.
here is a table with issue I found.
Some model just not exist on Hugging face
some have wrong names (e.g. FP4 → NVFP4). Please be precise in documentation we expect that it should work.
| Column 1 | Column 2 | Column 3 | D | E | Column 4 |
|---|---|---|---|---|---|
| Model | Quantization | Support Status | HF Handle | Size | Name on HF |
| GPT-OSS-20B | MXFP4 | ✅ | openai/gpt-oss-20b | 41.3 Gb | |
| GPT-OSS-120B | MXFP4 | ✅ | openai/gpt-oss-120b | 196 | |
| Llama-3.1-8B-Instruct | FP8 | ✅ | nvidia/Llama-3.1-8B-Instruct-FP8 | 9.09 | |
| Llama-3.1-8B-Instruct | NVFP4 | ✅ | nvidia/Llama-3.1-8B-Instruct-FP4 | 12.8 | ⚠️ nvidia/Llama-3.1-8B-Instruct-NVFP4 |
| Llama-3.3-70B-Instruct | NVFP4 | ✅ | nvidia/Llama-3.3-70B-Instruct-FP4 | 42.7 | ⚠️ nvidia/Llama-3.3-70B-Instruct-NVFP4 |
| Qwen3-8B | FP8 | ✅ | nvidia/Qwen3-8B-FP8 | 9.45 | |
| Qwen3-8B | NVFP4 | ✅ | nvidia/Qwen3-8B-FP4 | ⚠️ | ⚠️ NO MODEL on HF neither nvidia |
| Qwen3-14B | FP8 | ✅ | nvidia/Qwen3-14B-FP8 | 16.3 | |
| Qwen3-14B | NVFP4 | ✅ | nvidia/Qwen3-14B-FP4 | ⚠️ | ⚠️ NO MODEL on HF neither nvidia |
| Qwen3-32B | NVFP4 | ✅ | nvidia/Qwen3-32B-FP4 | ⚠️ | ⚠️ NO MODEL on HF neither nvidia |
| Qwen2.5-VL-7B-Instruct | NVFP4 | ✅ | nvidia/Qwen2.5-VL-7B-Instruct-FP4 | 7.22 | ⚠️ nvidia/Qwen2.5-VL-7B-Instruct-NVFP4 |
| Qwen3-VL-Reranker-2B | Base | ✅ | Qwen/Qwen3-VL-Reranker-2B | 4.27 | |
| Qwen3-VL-Reranker-8B | Base | ✅ | Qwen/Qwen3-VL-Reranker-8B | 17.6 | |
| Qwen3-VL-235-A22B | NVFP4 | ✅ | nvidia/Qwen3-VL-235-A22B-FP4 | ⚠️ 135 | ⚠️ nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4-MLPerf-Inference-Closed-V6.0 |
| Qwen3-VL-Embedding-2B | Base | ✅ | Qwen/Qwen3-VL-Embedding-2B | 4.27 | |
| Phi-4-multimodal-instruct | FP8 | ✅ | nvidia/Phi-4-multimodal-instruct-FP8 | 10.4 | |
| Phi-4-multimodal-instruct | NVFP4 | ✅ | nvidia/Phi-4-multimodal-instruct-FP4 | 8.98 | ⚠️ nvidia/Phi-4-multimodal-instruct-NVFP4 |
| Phi-4-reasoning-plus | FP8 | ✅ | nvidia/Phi-4-reasoning-plus-FP8 | 15.7 | |
| Phi-4-reasoning-plus | NVFP4 | ✅ | nvidia/Phi-4-reasoning-plus-FP4 | 9.73 | ⚠️ nvidia/Phi-4-reasoning-plus-NVFP4 |
| Nemotron3-Nano | BF16 | ✅ | nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | 63.2 | |
| Nemotron3-Nano | FP8 | ✅ | nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 | 32.7 |