dflash added pr to vllm ([Spec Decode] Add Sliding Window Attention support to DFlash drafter by jianc99 · Pull Request #40898 · vllm-project/vllm · GitHub), try
./build-and-copy.sh -t vllm-node --apply-vllm-pr 40898 -c
qwen3.5-122b-fp8, tp=2. dflash=15
── Run 1/2 ──────────────────────────────────────
[Q&A] 256 tokens in 5.27s = 48.5 tok/s (prompt: 23)
[Code] 512 tokens in 8.69s = 58.9 tok/s (prompt: 30)
[JSON] 1024 tokens in 21.26s = 48.1 tok/s (prompt: 48)
[Math] 64 tokens in 1.29s = 49.6 tok/s (prompt: 29)
[LongCode] 2048 tokens in 29.94s = 68.4 tok/s (prompt: 37)
── Run 2/2 ──────────────────────────────────────
[Q&A] 256 tokens in 4.94s = 51.8 tok/s (prompt: 23)
[Code] 512 tokens in 8.39s = 61.0 tok/s (prompt: 30)
[JSON] 1024 tokens in 20.80s = 49.2 tok/s (prompt: 48)
[Math] 64 tokens in 1.27s = 50.3 tok/s (prompt: 29)
[LongCode] 2048 tokens in 29.84s = 68.6 tok/s (prompt: 37)