Nemotron-3-Nano-30B-A3B-NVFP4 ultra-efficient NVFP4 precision version of Nemotron 3 Nano

eugr · February 1, 2026, 6:04am

Looks like the fix is in: fix: Add SM120 (RTX Blackwell) support for FlashInfer CUTLASS NVFP4 MoE kernels by renehonig · Pull Request #33417 · vllm-project/vllm · GitHub - will test soon, hopefully it works with sm121 too, not just sm120.

brian322 · February 1, 2026, 6:25am

Doesn’t the Spark DGX OS already have the NVIDIA Container Toolkit installed?

shahizat · February 1, 2026, 6:31am

so then set the nvidia runtime as the default in the Docker daemon configuration file, if you already installed it.

eugr · February 1, 2026, 6:40am

New versions of Docker just use --gpus=all instead

eugr · February 1, 2026, 7:40am

Well, it works, but flashinfer spits out a lot of errors and eventually it crashes. There was also something weird in the response that causes llama-benchy to ignore anything below 8192 token context. I’ll troubleshoot tomorrow if I have time.

eugr · February 1, 2026, 8:02am

I released a new version of llama-benchy, but the model crashed when benchmarking :)

model	test	t/s	ttfr (ms)	est_ppt (ms)	e2e_ttft (ms)
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4	pp2048	10913.72 ± 57.47	244.22 ± 0.99	187.66 ± 0.99	244.41 ± 0.91
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4	tg128	57.34 ± 0.12
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4	ctx_pp @ d4096	6555.06 ± 4272.06	3000.39 ± 3558.27	2943.83 ± 3558.27	3000.54 ± 3558.30
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4	ctx_tg @ d4096	57.22 ± 0.17
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4	pp2048 @ d4096	2687.50 ± 509.25	851.56 ± 173.92	795.00 ± 173.92	851.71 ± 173.94
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4	tg128 @ d4096	57.14 ± 0.30
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4	ctx_pp @ d8192	8787.21 ± 68.75	988.84 ± 7.26	932.28 ± 7.26	988.93 ± 7.26
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4	ctx_tg @ d8192	56.87 ± 0.38
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4	pp2048 @ d8192	1710.83 ± 23.56	1253.87 ± 16.65	1197.31 ± 16.65	1253.96 ± 16.66
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4	tg128 @ d8192	53.24 ± 5.66

llama-benchy (0.1.2)
date: 2026-01-31 23:58:33 | latency mode: generation

cosinus · February 1, 2026, 10:32am

I did a AWQ quant with llm-compressor for comparison and gave llama-benchy some runs:

model	test	t/s	ttfr (ms)	est_ppt (ms)	e2e_ttft (ms)
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	pp2048	6760.90 ± 14.10	345.92 ± 0.63	302.92 ± 0.63	346.00 ± 0.63
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	tg128	82.58 ± 0.40
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_pp @ d4096	6409.33 ± 16.35	682.02 ± 1.58	639.02 ± 1.58	682.08 ± 1.56
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_tg @ d4096	82.51 ± 0.10
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	pp2048 @ d4096	2077.74 ± 0.46	1028.68 ± 0.22	985.69 ± 0.22	1028.76 ± 0.23
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	tg128 @ d4096	82.01 ± 0.25
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_pp @ d8192	6100.29 ± 5.13	1385.88 ± 1.13	1342.89 ± 1.13	1385.96 ± 1.14
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_tg @ d8192	81.96 ± 0.25
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	pp2048 @ d8192	1204.08 ± 3.74	1743.89 ± 5.29	1700.90 ± 5.29	1743.97 ± 5.29
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	tg128 @ d8192	81.98 ± 0.11
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_pp @ d16384	5837.62 ± 8.06	2849.62 ± 3.88	2806.63 ± 3.88	2849.71 ± 3.88
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_tg @ d16384	81.50 ± 0.03
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	pp2048 @ d16384	640.88 ± 1.43	3238.61 ± 7.12	3195.61 ± 7.12	3238.69 ± 7.12
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	tg128 @ d16384	81.11 ± 0.28
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_pp @ d32768	5420.10 ± 6.76	6088.65 ± 7.53	6045.65 ± 7.53	6088.74 ± 7.54
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_tg @ d32768	79.91 ± 0.11
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	pp2048 @ d32768	316.11 ± 0.15	6521.68 ± 3.04	6478.69 ± 3.04	6521.76 ± 3.04
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	tg128 @ d32768	79.53 ± 0.38
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_pp @ d65536	4793.14 ± 8.89	13715.91 ± 25.39	13672.91 ± 25.39	13715.99 ± 25.38
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_tg @ d65536	77.47 ± 0.10
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	pp2048 @ d65536	144.01 ± 0.20	14263.78 ± 20.02	14220.78 ± 20.02	14263.89 ± 20.03
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	tg128 @ d65536	76.96 ± 0.23
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_pp @ d131072	3857.39 ± 4.53	34022.50 ± 39.85	33979.51 ± 39.85	34022.58 ± 39.84
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	ctx_tg @ d131072	72.80 ± 0.05
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	pp2048 @ d131072	58.90 ± 0.06	34811.10 ± 35.76	34768.10 ± 35.76	34811.19 ± 35.79
stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	tg128 @ d131072	72.74 ± 0.23

llama-benchy (0.1.2)
date: 2026-02-01 11:18:56 | latency mode: generation

Using eugr’s vLLM docker brew scripts (–use-wheels --pre-tf) and 0.16.0rc1.dev81+g672023877.cu13.

Quant over here:

trystan1 · February 1, 2026, 2:44pm

When you say model crashed, are you seeing this in dmesg and getting 500 errors in vllm?

[ 3886.179778] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 1, SM 0): Illegal Instruction Parameter

[ 3886.179787] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Global Exception on (GPC 1, TPC 1, SM 0): Multiple Warp Errors

[ 3886.179792] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x516730=0x2000b 0x516734=0x24 0x516728=0x1c81fb60 0x51672c=0x1174

[ 3886.179796] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 1, SM 1): Illegal Instruction Parameter

[ 3886.179965] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Global Exception on (GPC 1, TPC 1, SM 1): Multiple Warp Errors

[ 3886.180352] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x5167b0=0x1000b 0x5167b4=0x24 0x5167a8=0x1c81fb60 0x5167ac=0x1174

[ 3886.180357] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 3, SM 0): Illegal Instruction Parameter

[ 3886.180451] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Global Exception on (GPC 1, TPC 3, SM 0): Multiple Warp Errors

[ 3886.180531] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x518730=0x1000b 0x518734=0x24 0x518728=0x1c81fb60 0x51872c=0x1174

[ 3886.180790] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 3, SM 1): Illegal Instruction Parameter

[ 3886.180889] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Global Exception on (GPC 1, TPC 3, SM 1): Multiple Warp Errors

[ 3886.180979] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x5187b0=0x1000b 0x5187b4=0x24 0x5187a8=0x1c81fb60 0x5187ac=0x1174

[ 3886.181289] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 5, SM 0): Illegal Instruction Parameter

[ 3886.181388] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Global Exception on (GPC 1, TPC 5, SM 0): Multiple Warp Errors

[ 3886.181473] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x51a730=0x2000b 0x51a734=0x24 0x51a728=0x1c81fb60 0x51a72c=0x1174

[ 3886.181711] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 5, SM 1): Illegal Instruction Parameter

[ 3886.181814] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Global Exception on (GPC 1, TPC 5, SM 1): Multiple Warp Errors

[ 3886.181891] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x51a7b0=0x2000b 0x51a7b4=0x24 0x51a7a8=0x1c81fb60 0x51a7ac=0x1174

[ 3886.182224] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 2, TPC 1, SM 0): Illegal Instruction Parameter

[ 3886.182319] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Global Exception on (GPC 2, TPC 1, SM 0): Multiple Warp Errors

[ 3886.182397] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x526730=0x3000b 0x526734=0x24 0x526728=0x1c81fb60 0x52672c=0x1174

[ 3886.182669] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 2, TPC 1, SM 1): Illegal Instruction Parameter

[ 3886.182767] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Global Exception on (GPC 2, TPC 1, SM 1): Multiple Warp Errors

[ 3886.182845] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x5267b0=0x3000b 0x5267b4=0x24 0x5267a8=0x1c81fb60 0x5267ac=0x1174

[ 3886.183149] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 2, TPC 2, SM 1): Illegal Instruction Parameter

[ 3886.183246] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Global Exception on (GPC 2, TPC 2, SM 1): Multiple Warp Errors

[ 3886.183332] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x5277b0=0x3000b 0x5277b4=0x24 0x5277a8=0x1c81fb60 0x5277ac=0x1174

[ 3886.183625] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 2, TPC 3, SM 0): Illegal Instruction Parameter

[ 3886.183719] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x528730=0x2000b 0x528734=0x20 0x528728=0x1c81fb60 0x52872c=0x1174

[ 3886.183938] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 2, TPC 3, SM 1): Illegal Instruction Parameter

[ 3886.184042] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x5287b0=0x2000b 0x5287b4=0x20 0x5287a8=0x1c81fb60 0x5287ac=0x1174

[ 3886.184333] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 2, TPC 5, SM 0): Illegal Instruction Parameter

[ 3886.184427] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x52a730=0xb 0x52a734=0x20 0x52a728=0x1c81fb60 0x52a72c=0x1174

[ 3886.184677] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 2, TPC 5, SM 1): Illegal Instruction Parameter

[ 3886.184768] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x52a7b0=0x3000b 0x52a7b4=0x20 0x52a7a8=0x1c81fb60 0x52a7ac=0x1174

[ 3886.185096] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 3, TPC 1, SM 0): Illegal Instruction Parameter

[ 3886.185189] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x536730=0x1000b 0x536734=0x20 0x536728=0x1c81fb60 0x53672c=0x1174

[ 3886.185401] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 3, TPC 1, SM 1): Illegal Instruction Parameter

[ 3886.185498] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x5367b0=0x3000b 0x5367b4=0x20 0x5367a8=0x1c81fb60 0x5367ac=0x1174

[ 3886.185791] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 3, TPC 2, SM 0): Illegal Instruction Parameter

[ 3886.185884] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x537730=0x2000b 0x537734=0x20 0x537728=0x1c81fb60 0x53772c=0x1174

[ 3886.186156] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 3, TPC 3, SM 0): Illegal Instruction Parameter

[ 3886.186252] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x538730=0x3000b 0x538734=0x20 0x538728=0x1c81fb60 0x53872c=0x1174

[ 3886.186484] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 3, TPC 3, SM 1): Illegal Instruction Parameter

[ 3886.186585] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x5387b0=0x1000b 0x5387b4=0x20 0x5387a8=0x1c81fb60 0x5387ac=0x1174

[ 3886.186890] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 3, TPC 5, SM 0): Illegal Instruction Parameter

[ 3886.186984] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x53a730=0x1000b 0x53a734=0x20 0x53a728=0x1c81fb60 0x53a72c=0x1174

[ 3886.187199] NVRM: Xid (PCI:000f:01:00): 13, Graphics SM Warp Exception on (GPC 3, TPC 5, SM 1): Illegal Instruction Parameter

[ 3886.187292] NVRM: Xid (PCI:000f:01:00): 13, Graphics Exception: ESR 0x53a7b0=0xb 0x53a7b4=0x20 0x53a7a8=0x1c81fb60 0x53a7ac=0x1174

[ 3886.190317] NVRM: Xid (PCI:000f:01:00): 43, pid=20919, name=VLLM::EngineCor, channel 0x00000002

eugr · February 1, 2026, 5:13pm

Yep, and also exceptions in vLLM log itself.

eugr · February 1, 2026, 6:07pm

Yes, AWQ is still outperforming NVFP4 for inference. Also notice how much more consistency is there between runs.

christopher_owen · February 1, 2026, 6:32pm

🤔 where is this the case?…

eugr · February 1, 2026, 6:46pm

See two benchmarks for Nemotron above. 83 t/s for AWQ (this is what I’d expect of 3B active params) vs 60 t/s for NVFP4.

But that’s with a standard vLLM build where FP4 pathways are not fully enabled for sm121.

brian322 · February 2, 2026, 1:58am

I’ve been trying to get the Nemotron 3 Nano 30B A3B NVFP4 model running, with varying degrees of success.

I’m also seeing the crashes inflashinfer at startup and these were also present in the official Nvidia vLLM container that just came out.

Otherwise it runs pretty well, but then eventually just crashes. It may take a while to crash, but I haven’t been able to keep it stable.

When trying to run in a cluster, I get nowhere. It will start a query and then very shortly after go into a loop or crash.

I did try the AWQ quant, and indeed it runs even faster.

My big concern is that maybe we’re losing a bit of accuracy over NVFP4? My use-case is to use the model in a harness like OpenCode to manage software development projects, and the big problem is accuracy, especially where tool calls get broken. I got a broken tool call within minutes of starting a planning task, so that’s not really ideal. The NVFP4 quants seemed to give me more accurate responses, but then they just crashed after maybe an hour or so.

Let’s hope this whole NVFP4 stack can get sorted out pretty soon!

eugr · February 2, 2026, 4:38am

Yes, this seems to be W4A16 model, and NVFP4 quant was produced by NVIDIA itself with Post-Quantization Training, so AWQ quant will need a very good calibration dataset to get close in accuracy.

brian322 · February 2, 2026, 7:33am

This is the sort of crash I’m seeing after a fair amount of processing -

(EngineCore_DP0 pid=100) ERROR 02-02 07:31:50 [core.py:968] torch.AcceleratorError: CUDA error: an illegal instruction was encountered
(EngineCore_DP0 pid=100) ERROR 02-02 07:31:50 [core.py:968] Search for cudaErrorIllegalInstruction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. (EngineCore_DP0 pid=100) ERROR 02-02 07:31:50 [core.py:968] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. (EngineCore_DP0 pid=100) ERROR 02-02 07:31:50 [core.py:968] For debugging consider passing CUDA_LAUNCH_BLOCKING=1 (EngineCore_DP0 pid=100) ERROR 02-02 07:31:50 [core.py:968] Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

tatamiso · February 2, 2026, 1:54pm

Hi,

how do you do this exactly (commands-wise), especially this part: and 0.16.0rc1.dev81+g672023877.cu13 to get it running?

cosinus · February 2, 2026, 5:36pm

Using the world famous build job out of GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks

./build-and-copy.sh --use-wheels -pre-tf

The job pulls the nightly build of wheels by default. So I must have fetched yesterdays build of wheels.

  --use-wheels [mode]          : Use prebuilt vLLM wheels. Mode can be 'nightly' (default) or 'release'
  --pre-tf, --pre-transformers : Install transformers 5.0.0rc0 or higher

And I used the latest transformers as of yesterday (because the 5.0.0 is needed for glm-4.7-flash).

tatamiso · February 2, 2026, 6:36pm

Ah ok, it’s the same! I’ll try running building it again.

Also about vllm serve command, what parameters did you use?
Any specific ones for this:

stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ

similar to the glm ones:

launch-cluster.sh -t vllm-node-tf5 --solo \
  exec vllm serve cyankiwi/GLM-4.7-Flash-AWQ-4bit \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice \
  --served-model-name glm-4.7-flash \
  --max-model-len 131072 \
  --max-num-batched-tokens 4096 \
  --max-num-seqs 64 \
  --host 0.0.0.0 --port 30000 \
  --gpu-memory-utilization 0.7 \
  --enable-expert-parallel

cosinus · February 2, 2026, 7:14pm

This is how I use the container:

cosinus@vroomfondel$ docker run --rm -it --gpus all --ipc=host --name vLLM -v $HOME/models:/models -v $HOME/.cache:/root/.cache -e HF_HUB_CACHE=/models -e HF_TOKEN=hf_replace_me --entrypoint /bin/bash -p 8000:8000 vllm-node:20260201

root@dd35fdc73d20:/workspace/vllm# wget https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4/resolve/main/nano_v3_reasoning_parser.py && \ 
  vllm serve stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ \
  --port 8000 \
  --trust-remote-code \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser-plugin nano_v3_reasoning_parser.py \
  --reasoning-parser nano_v3 \
  --kv-cache-dtype fp8

eugr · February 2, 2026, 7:36pm

I’ve added a mod that handles parser download - testing it now, and will commit to the repo shortly.

Topic		Replies	Views
NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 DGX Spark / GB10 nemotron	89	7974	March 31, 2026
DGX Spark, Nemotron3, and NVFP4: Getting to 65+ tps DGX Spark / GB10 spark , nemotron , dgx	14	1722	December 22, 2025
PSA: State of FP4/NVFP4 Support for DGX Spark in VLLM DGX Spark / GB10	224	8206	April 7, 2026
Unable to run Nemotron AGX Thor Dev Kit Jetson Thor generative_ai , nemotron	10	167	March 11, 2026
nvidia/Nemotron-Cascade-2-30B-A3B yet another model to test DGX Spark / GB10 nemotron	19	1316	March 24, 2026
We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! DGX Spark / GB10	145	6149	March 28, 2026
Nemotron 3 Super Improvements and Fixes NVIDIA Nemotron nim , nemotron	5	211	April 7, 2026
Testing Nemotron 3 Nano Models on Nvidia DGX Spark/Jetson Thor with vLLM and FlashInfer DGX Spark / GB10 jetson , nemotron	3	431	February 15, 2026
RedHatAI/Qwen3.5-122B-A10B-NVFP4 seems to be the best option for a single Spark DGX Spark / GB10 Projects llm	73	3997	April 10, 2026
Announcing new VLLM container & 3.5X increase in Gen AI Performance in just 5 weeks of Jetson AGX Thor Launch Jetson Thor jetson , llama-31-8b-instruct , llama , deepseek , nemotron	46	3655	December 14, 2025

Nemotron-3-Nano-30B-A3B-NVFP4 ultra-efficient NVFP4 precision version of Nemotron 3 Nano

Related topics